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ABSTRACT 


The  purpose  of  the  work  reported  here  was  to  present  the  structure 
of  factor  analysis  to  a  physical  scientist  and  to  extend  the  structure 
where  it  was  weakest. 

The  reference  guide  in  the  appendix  performs  as  a  dynamic  survey 
of  factor  analysis  by  guiding  a  neophyte  factor  analyst  through  an 
application.  Reference  is  made  to  expanded  presentations  in  the  body 
of  the  report. 

The  structure  of  factor  analysis  has  been  extended  in  the  following 
areas:  effects  of  the  number  of  observations,  sampling  effect,  interpre¬ 
tation  of  factors,  and  communal ity. 
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EVALUATION 


The  purpose  of  this  work  was  to  study  techniques  in  factor  analysis  in 
order  to  provide  an  objective  and  mathematical  standard  in  the  field.  This 
study  was  needed  to  make  factor  analysis  a  useful  analytical  tool  for  prac¬ 
ticing  engineers  and  scientists.  Those  areas  investigated  which  have  made 
factor  analysis  less  attractive  for  use  as  an  analytic  tool  are:  problem  of 
communality  estimates,  number  of  observations  for  a  valid  factor  analysis, 
uniqueness,  and  sampling  effects  on  factor  structure.  Attempts  were  made 
and  were  partially  successful  in  storing  these  problems.  The  results  of 
this  study  are  two  fold: 

(1)  An  attempt  to  explain  mathematically  the  events  occurring 
during  a  factor  analysis  which  can  be  understood  by  engineers  and  scientists 
This  in  turn  will  allow  a  practicing  engineer  to  make  an  objective  decision 
whether  he  can  use  factor  analysis  as  an  analytic  tooll. 

(2)  Once  an  engineer  decides  to  use  factor  analysis  in  his  work, 

a  handbook  or  reference  guide  is  provided  which  outlines  a  step  by  step 
procedure  for  conducting  a  factor  analysis;  starting  with  the  construction 
of  his  experiment  and  ending  with  aids  to  interpret  results.  Computer 
program  descriptions  are  also  provided  including  formats  for  inputting  raw 
data.  * 

The  results  of  this  study  have  already  bfeen  put  to  practice  by  members 
of  EMIIH  in  constructing  an  experimental  classification  model  to  be  used  for 
automatic  dissemination  of  technical  documents  to  engineers  and  scientists 
in  RADC. 


RADC  Project  Engineer 


Section  I 


INTRODUCTION 

■1.1  BRIEF  HISTORY 

It  is  appropriate  to  begin  the  Introduction  to  this  final  report 
with  Truman  Kelley's  remarks  made  in  his  1940  publication  (Reference  1, 
p.  120): 

"There  is  no  search  for  timeless,  spaceless,  population¬ 
less  truth  in  factor  analysis;  rather,  it  represents  a  simple, 
straightforward  problem  of  description  in  several  dimensions 
of  a  definite  group  functioning  in  definite  manners,  and  he 
who  assumes  to  read  more  remote  verities  into  the  factorial 
outcome  is  certainly  doomed  to  disappointment." 

This  particular  passage  was  also  selected  by  Harman  (Reference  2, 
p.  5)  to  emphasize  the  simplicity  of  the  problem  and  potential  pitfalls 
of  understanding  its  solution.  Regardless  of  what  is  done  in 
methodology  or  conceptual  studies,  an  acceptance  of  the  basic  model 
necessarily  inplies  that  the  problem  remains  simple  and  the  solution 
remains  ambiguous . 

Since  factor  analysis  was  found  useful  around  the  turn  of  this 
century  by  a  psychologist,  Charles  Spearman,  and  described  mathematically 
by  a  statistician,  Karl  Pearson,  the  development  of  techniques  has 
more  or  less  followed  the  lines  of  the  empirical  school.  That  is, 
methods  to  obtain  factor  solutions  have  evolved  more  from  the  necessity 
of  describing  certain  underlying  psychological  entities  by  meaningful 
groups  of  hypothetical  constructs  than  from  an  application  of  advanced 
mathematical  ideas  to  the  basic  mathematical  problem.  As  a  consequence, 
factor  analysis  suffered  from  a  lack  of  mathematical  ordering1  of  its 
esoteric  devices  until  Harry  Harman,  in  close  association  with 
Karl  Holzinger,  published  in  1960  an  excellent  summary  of  most  of 
the  significant  factor  analysis  work  which  had  been  done  to  that  time 
(Reference  2).  This  book.  Modem  Factor  Analysis,  has  been  welcomed 
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into  and  accepted  by  most  of  the  factor  analysis  groups  in  this 
country  as  a  general  reference  guide  useful  in  selecting  an  appropriate 
method  or  set  of  methods.  Its  comparative  presentations  are  very  good. 

The  intent  of  this  study  was  not  simply  a  reiteration  of  Harman's 
work  with,  perhaps,  a  few  more  up-to-date  details.  Rather  it  was  an 
investigation  into  a  few  of  the  unsolved,  classic  mathematical  problems 
with  a  demonstration  of  how  too  little  knowledge  of  necessary  assumptions 
concerning  these  problems  can  be  troublesome  and  at  times  devastating. 
Attempts  were  made  and  were  partially  successful  in  solving  the  problems 
of  communality  estimates,  number  of  observations  for  a  valid  factor 
analysis,  uniqueness,  and  sampling  error  effects  on  factor  structure. 

1.2  THE  MODEL  AND  SOME  MATRIC  NOTATIONS 

Factor  analysis  is  concerned  with  the  study  of  an  array  of  • 
numbers  which  has  certain  properties  and  contains  information  about 
linear  relationships  among  sets  of  data  points.  This  array  is  called 
a  correlation  matrix  and  the  numbers,  or  entries,  are  called' correlation 
coefficients .  The  array  is  so  constructed  that  the  number  in  the  ith 
row  and  j  column  represents  the  correlation,  or  degree  of  linear 

j.1.  a.L 

relationship  (y  *  ax  +  b),  between  the  ixn  and  j  sets  of  data 
points.  For  5  sets  of  data  points  such  an  array  might  look  like: 
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.2  .1 


4 
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.3 
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.2 


.1 


.8 


.8 


Easily  noted  is  that  the  number  in  the  4th  row  and  Sth  column  is 
the  same  as  the  number  in  the  5th  row  and  4th  column,  and,  in  fact,  the 
.umber  in  the  i**1  row  and  j**1  column  (call  it  r„)  is  the  same 
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t 


V 


V 


v  * 
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as  the  number  in  the  j**1  row  and  i**1  column  (call  it  r^).  This 
property  of  symmetry,  as  well  as  others,  will  be  stated  more  formally 
a  little  later  but  is  worth  noting  in  a  preliminary  discussion  on  the 
classic  problems  and  the  model. 

The  problems  treated  in  this  report  are  mostly  those  which  have 
caused  mathematicians  to  reject  factor  analysis  as  a  useful  analytical 
tool.  Many  of  the  reasons  for  rejection  are  unjustified — some  are 
justified.  Those  reasons  which  are  unjustified  concern  the  misunder¬ 
standing  or  misuse  of  the  basic  model  and/or  assumptions  necessary  in 
determining  a  "unique"  solution. 

The  basic  model  stated  simply  is  this:  given  a  correlation  matrix 
for  a  set  of  data  points  with  appropriately  selected  diagonal  values, 
determine  a  set  of  factors  (or  hypothetical  variables)  which  when 
linearly  combined  reproduce  the  original  set  of  data  points.  In  a 
sense,  then,  the  model  is  the  same  as  for  multiple  linear  regression 
only  the  independent  variables  are  replaced  by  hypothetical  variables. 
The  big  difference,  of  course,  is  that  the  final  synthesis  of  original 
data  points  is  complete  for  all  variables  in  factor  analysis  and. 
complete  only  for  the  dependent  variables  in  a  regression. 

Let  us  adopt  the  vector  notation  X  to  mean  an  ordered  sequence 
of  values,  or  elements,  (x^,  x2>  ...,  x^).  Then  in  vector  notation 
the  linearity  of  the  model  is  seen  to  be 


=  a.  F  +  a.  F„  + 
11  1  32  2 


+  a.  F  +  a.U, 
im  m  i  3 


where  X..  is  the  original  set  of  observations,  F^  through  Fra 
are  the  hypothetical  common  variables,  or  factors,  U.  is  the  unique 


factor,  and  the  coefficients  ^  through  a^ 


and  a.  are  those 
3 


loadings  required  to  reproduce  X ^ .  The  "linearity"  of  the  model  can 
not  be  overemphasized.  In  most  multivariate  studies ,  it  is  at  best  a 
crude  approximation  to  inherent  nonlinearities  which  occur  in  nature. 
The  model  represents  a  compromise  between  synthesis  accuracy  and 
computational  feasibility,  a  compromise  which  is  too  often  considered 
inviolate  for  interpretation  purposes. 
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Before  outlining  the  classic  problems  let  us  digress  an  instant 
to  review  some  matrix  algebra  and  notation,  A  set  of  vectors  arranged 
in  such  a  manner  that  the  elements  of  the  vectors  form  rows  and  columns 
is  called  a  matrix  and  will  be  denoted  by  a  capital  letter,  e.g.  R. 

To  illustrate  further  using  the  previous  example: 


R 


.4  .1  1.0  .2  .1 


.2  1.0  .8 
.1  .8  1.0 


) 


Note  that  the  diagonal  elements  are  ones,  a  classic  problem  we  shall 
dwell  on  shortly.  The  transpose  R  of  this  matrix  is  simply  the 
matrix  with  its  rows  and  columns  interchanged  such  that  a  typical 
element  r..  becomes  r...  A  symmetric  matrix  is  a  matrix  which  is 
the  same  as  its  transpose —  R  =  R  in  our  example. 

To  review  the  four  fundamental  matrix  operations: 

A  +  B  -  (a^  +  b^) 


A  -  B  =  (a^  -  b^) 


AB 


N 

& 


a . ,  b,  . 
ik  k] 


where  N  is  the  number  of  columns 
in  A  and  rows  in  B 


cA  =  (ca^j) 


7116  row  order  of  a  matrix  is  the  number  of  rows  of  the  matrix. 

The  column  order  is  the  number  of  columns.  The  determinant  of  a  matrix 
of  order  N  is  the  summation  defined  as  follows: 
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det  A 


det  A(i|k) 


=  |A|  =  !  (-Di+kaik 

k=l  iK 

where  1  £  i  £  N  and  A(i|k)  denotes  a  matrix  with  the  i**1  row  and 
k  column  removed.  Starting  with  the  determinant  of  a  second  order 
square  matrix  (number  of  rows  equals  the  number  of  columns)  the  idea 
of  using  a  determinant  to  define  a  determinant  presents  itself  as  being 
the  easiest  to  understand  and  illustrates  the  difficulty  of  deriving 
another  definition.  A  matrix  is  singular  if  det  A  =  0. 

Some  important  theorems  in  applying  matrix  theory  to  factor 
analysis  are: 


T 

Theorem  1.1:  det  A  =  det  A 


Theorem 

1.2: 

If  all 

elements  of  any  column 

(or  row)  of  A 

are  zero,  then  det  A  = 

0. 

Theorem 

1.3: 

If  two 

columns  (or  rows)  of  A  are  proportional. 

then  det  A 

=  0. 

Theorem 

1.4: 

If  A 

is  square  of  order  n, 

det  (cA)  =  cn  det  A. 

Theorem 

1.5: 

Let  A 

be  a  square  matrix  of 

order  n.  Then  the 

system  of  homogeneous  linear  equations  A(X  ,  X2,  . ...  Xn>  =  AX  =  0 
has  a  nontrivial  solution  if  det  A  =  0. 

If  we  delete  some  rows  and  columns  of  a  matrix  A,  the  remaining 
elements  form  a  submatrix  of  A.  A  square  submatrix  of  A  is  called 
principal  if  its  diagonal  is  part  of  the  diagonal  of  A.  The  rank 
of  A  is  the  order  of  the  largest  square  submatrix  of  A  whose 
determinant  is  nonzero. 

1.3  A  STATEMENT  OF  THE  CLASSIC  PROBLEMS 

As  we  mentioned  previously,  factor  analysis  is  concerned  with  the 
study,  or  factoring,  of  a  correlation  matrix.  Having  discussed  the 
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logical  mode  of  representing  the  matrix  and  associated  items  of 
interest,  let  us  restate  the  factor  analysis  problem. 

Theorem  1,6;  For  every  correlation  matrix  R  there  exists  a 
corresponding  factor  matrix  F  such  that 

T 

FF  =  R. 

Furthermore, 

Theorem  1.7:  There  exists  an  infinite  number  of  factor  matrices 
F  which  reproduce  any  given  correlation  matrix  R. 

The  problem,  then,  is  not  only  to  determine  F  but  to  find  an  F 
which  is  most  likely  to  satisfy  a  given  set  of  initial  conditions.  A 
factor  analysis  is  done  in  two  stages: 

Stage  1:  Factoring  problem — find  an  F  such  that 

FF^  =  R  and  also  such  that  the  column  order 
of  F  is  the  minimal  rank  of  R. 

Stage  2:  Rotation  problem — rotate  the  arbitrary  reference 
frame  into  a  "preferred"  or  "simplifying" 
position. 

In  Stage  1  we  mentioned  the  minimal  rank  of  R.  Ordinarily  the 
rank  of  a  matrix  is  fixed  as  soon  as  its  elements  are  fixed.  However, 
the  diagonal  elements  of  R  have  special  meaning  in  that  they 
represent  the  total  variance  of  each  variable.  Due  to  the  description 
of  the  model  in  terms  of  both  common  and  unique  factors ,  the  total 
variance  can  be  split  into  common  factor  variance  (communality)  and 
unique  variance.  The  factor  analysis  of  a  correlation  matrix  with 
communalities  on  the  diagonal,  the  reduced  correlation  matrix,  will 
then  yield  only  the  common  factor  portion  of  the  model.  However  the 
proportion  of  total  variance  ascribable  to  conmon  factors  is 
generally  unknown.  Thus,  the  matrix  R  is  incomplete  at  the  onset  of 
a  factor  analysis.  The  communality  problem  consists  of  finding  those 
diagonal  elements  of  R  that  minimize  the  rank  of  R. 
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Once  the  rank  of  R  has  been  established,  the  number  of  common 
factors  is  known  and  F  can  be  determined  by  a  variety  of  means  (see 
Section  V),  By  a  previously  stated  theorem,  however,  there  are  an 
infinitude  of  F's  which  will  do  the  job.  The  selection  of  the 
solution  configuration — in  other  words,  the  relative  number  of  high 
loadings  per  factor  as  well  as  degree  of  relationship  among  factors — 
is  another  classic  problem.  Probably  the  most  commonly  selected 
configuration  is  one  called  simple  structure  developed  by 
L.  L.  Thurstone  and  offers  the  psychologist  an  optimal  balance 
between  statistical  simplicity  and  psychological  utility.  There  is 
little  reason  to  believe  that  simple  structure  is  of  any  real  value 
outside  the  domain  of  a  very  special  class  of  problems;  however, 
intuitively  it  represents  what  may  usually  be  desired  in  a  factor 
solution  (see  Section  V  for  detail). 

Solution  uniqueness  is  another  classic  problem  which  is  important 
in  defining  the  general  usefulness  of  factor  analysis.  Assuming  that 
a  solution  has  been  found  which  satisfies  a  given  class  of  constraints 
and  boundary  conditions,  what  can  we  say  about  the  uniqueness  of  this 
solution  compared  with  a  solution  derived  using  another  set  of  data 
points  from  the  same  multivariate  population?  Both  solutions  will  be 
identical  if  infinite  samples  are  used.  However,  from  a  practical 
viewpoint  only  a  finite  number  of  samples  are  possible  and,  in  most 
cases,  this  number  is  small.  Thus,  the  problem  of  uniqueness  is 
really  an  error  analysis  of  sampling  effects  on  bivariate  statistics 
and  matrix  operations . 

Solution  completeness  is  a  problem  which  involves  a  decision  to 
stop  the  factoring  process  after  enough  factors  have  been  found  (or 
extracted) .  This  decision  can  be  made  in  many  ways  depending  on  the 
kind  of  factor  structure  being  derived.  There  does  not  exist  a 
universal  completeness  criterion  and  the  problem  of  completeness  is 
often  thought  of  as  really  the  problem  of  communality  selection. 

In  Stage  2  the  rotation  problem  was  stated  as  being  one  of 
finding  a  reference  frame  which  provides  a  "preferred"  or  "simplified" 
position.  The  rotational  aspects  of  factor  analysis  are  the  most 


7 


difficult  to  either  understand  or  implement.  This  problem  is  by  far 
the  most  important  since  the  factor  analyst  has  an  infinitude  of 
reference  frames  at  his  disposal  from  which  he  is  to  select  one. 
Consider  a  similar  problem  whereby  a  point  (x,  y)  in  a  plane  is 
identified  by  its  position  relative  to  some  orthogonal  or  non- 
orthogonal  axes.  The  meaning  of  "preferred"  or  "simplified"  is 
indeed  vague  and  more  or  less  has  been  defined  by  the  analyst  as  a 
solution  which  fits  closest  to  his  hypothesized  factor  structure.  In 
the  case  of  a  psychologist  this  factor  structure  has  been  characterized 
by  Thurstone's  simple  structure.  There  are  other  structures  which 
can  be  used,  but  they  are  not  nearly  developed  to  the  extent  of 
Thurstone's  work. 

1.4  METHOD  OF  APPROACH  TO  THE  CLASSIC  PROBLEMS 

Maturally  the  classic  problems — those  problems  which  have  defied 
analytical  solutions— can  not  all  be  solved  in  one  year  of  study. 

The  very  implication  would  be  most  insulting  to  the  scientists  who 
have  spent  lifetimes  trying  to  clarify  the  intrinsic  value  of  the 
methods.  However,  the  time  is  ripe  to  establish  a  mathematical 
standard  in  factor  analysis  and  provide  mathematical  explanations  of 
the  infinite  solution  space  phenomenon  as  it  effects  uniqueness  and 
other  solution  characteristics.  The  problems  which  have  been 
considered  in  this  study  are  the  following: 

1.  communality  estimate  and  completeness 

2.  uniqueness 

3.  rotation  and  interpretation. 

The  communality  problem  was  approached  from  the  standpoint  of 
selecting  diagonal  elements  which  both  minimized  the  rank  of  R  and 
preserved  the  Gramian  property  (symmetric  and  the  determinants  of  all 
principal  submatrices  are  positive  or  zero).  Several  attempts  were 
made  using  various  iterative  schemes  and  the  technique  of  bordering 
has  been  found  to  solve  the  problem.  Details  of  this  technique  are 
given  in  Section  IV. 

The  uniqueness  problem  was  approached  from  two  angles — perturbing 
correlations  and  perturbing  data.  Correlations  were  bounded  by 
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standard  error  intervals  based  on  the  sample  size  and  effects  noted 
on  the  factor  structure.  Data  was  randomized  according  to  hypothetical 
correlations  and  distribution  functions  and  new  correlations  derived 
and  factored.  Effects  were  again  noted  on  the  factor  structure  and 
empirical  results  are  presented  in  Section  XV. 

The  rotation  and  interpretation  problem  was  approached  through 
regression  analysis  in  an  attempt  to  provide  a  measure  of  importance 
for  oblique  or  rotated  factor  loadings.  The  classic  problem  has  been 
to  identify  or  interpret  factors  using  the  factor  loadings.  Results 
are  presented  in  Section  V. 
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Section  II 


CORRELATION  THEORY 


2.1  INTRODUCTION 

In  this  seetion  we  shall  concern  ourselves  with  the  basic  unit  of 
factor  analysis:  the  correlation  coefficient.  Factor  analysis 
amounts  to  factoring  a  certain  matrix,  the  correlation  matrix,  whose 
elements  are  the  correlation  coefficients,  in  this  section  we 
Shall  talk  about  these  correlation  coefficients.  In  2.2  we  shall 
generally  define  correlation  and  the  coefficient  describing  it.  In  2.3 
we  will  consider  different  types  of  bivariate  correlation  coefficients 
and  also  compute  examples.  It  will  be  seen  that  the  coefficient  most 
commonly  used  is  Pearson's  product-moment  correlation  coefficient. 

This  coefficient  will  be  interpreted  geometrically  in  view  of  the  factor 
model  in  2.4.  Its  statistical  significance  and  reliability  is  then 
discussed  in  2.5.  In  2.6  we  consider  how  the  product-moment  correlation 
coefficient  can  be  derived  if  data  is  missing  from  one  variable  or  the 
other.  In  2.7  we  will  touch  briefly,  for  completeness,  the  areas  of 
partial  and  multiple  correlation  coefficients. 


2.2  DEFINITION  OF  CORRELATION  AND  THE  COEFFICIENT 

In  factor  analysis  we  are  interested  in  the  interrelationship  of 

different  variables,  which  we  then  analyze.  But  first  we  have  to  have 

a  mathematical  tool  to  express  interrelationship  between  variables. 

This  tool  is  given  by  the  correlation  coefficient. 

Denote  by  X.  and  X,  two  variables  each  having  values  for  N 
1  * 

individuals.  We  first  make  the  two  variables  comparable  by  deviating 
them,  that  is  measuring  their  values  from  comparable  zero  points. 

This  is  achieved  by  forming  the  devi ates: 


1  N 

x.  =  X,  -  X.  ,  X.  =  rr  5"  X..  =  the  mean  of  variable  X. 

J  j  3  j  N  i“1  3i  j 


and 


—  —  1  r 

Xj(  =  Xj{-X|<,  =  I  =  t*ie  mean  of  variable 


i=l 


Basically  we  assume  that  the  relationship  between  variables  x^ 
and  x^  is  linear,  so  that  in  plotting  their  paired  values  (x..^,  x^), 
i  =  1, in  a  coordinate  system,  with  the  zero  point  at  the  means 
of  the  two  variables,  we  can  ideally  lay  a  straight  line  through  these 
points.  It  will,  though,  obviously  hot  always  be  the  case  that  the 
points  be  on  a  straight  line.  Then  we  try  to  fit  a  straight  line  to  the 
points.  Expressing  the  points  on  the  line  by  "x.^,  then  the  line  can 
be  described  by 


i  =  1,...,N  , 


where  a  is  called  the  slope  of  the  line.  The  slope  shows  the 
relationship  between  x.^  and  x^,  i  3  1,...,N.  If  a  =  1, 
x^  =  x^  and  the  relationship  is  perfect;  if  a  =  0,  there 
does  not  exist  any  relationship  between  x\^  and  x^,  i  =  1,...,N, 
So  we  are  interested  in  a  which  will  later  constitute  our  coefficient 
of  correlation. 

xji  ~  axki  a  ^ne  t0  the  points.  The  condition  for 

it  is  a  "least  square  fit",  that  is. 


J>  (x..  -  x..)2  =  minimum. 


Then  we  have 


k  '  k  ("J‘  ‘ 


axki)2 


=  i=l  ^  i=i  XiiXki  +  ^  i 


with  the  condition  to  choose  a  so  that  this  expression  is  a  minimum. 
Therefore  we  differentiate  the  expression  with  respect  to  a  and  set 
the  result  equal  to  zero,  obtaining 

N  N 

-2  . I .  XjiXki  +  2a  l  xki  a.  °* 

i=l  i=l 
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Therefore 
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r.^  is  called  Pearson's  product -moment  correlation  coefficient  between 


the  standardized  variables  Z.  and  z.  . 

3  K 


2  .3  TYPES  OP  CORRELATION  COEFFICIENTS  -  BIVARIATE 

In  Section  2.2  we  have  derived  Pearson's  product -moment  correlation 
coefficient.  Besides  this  correlation  coefficient  there  exist  still 
other  correlation  coefficients,  partly  derivations  from  Pearson’s  r 
to  take  care  of  a  specific  nature  of  the  variables . 

In  the  present  section  we  want  to  summarize  most  of  the  important 
correlation  coefficients.  We  shall  do  this  in  a  systematic  way.  So  we 
shall  define  in  A.  Kendall's  General  f -correlation  Coefficient,  from 
which  1.  Kendall's  T-,  2.  Spearman's  p-,  and  also  3  Pearson's 

r-correlation  Coefficients  can  be  derived  as  special  cases. 

Next  we  shall  consider  in  3.  Correlation  Coefficients  for 
Dichotomized  Variables  (i.e.  variables  which  are  given  by  their 
frequencies  in  two  classes).  We  shall  discuss  in  1^  The  Biserial 
Correlation  Coefficient  (a  correlation  coefficient  for  two  variables, 
of  which  one  is  dichotomous  and  one  has  quantitative  scores)  in 
2.  The  $ -coefficient  (a  correlation  coefficient  for  two  variables. 
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which  are  both  truely  dichotomous)  and  in  3.  The  Tetrachoric 
Correlation  Coefficient  (a  coefficient  for  two  variables  which  are  both 
dichotomized  from  underlying  normal  and  continuous  distributions). 

In  C.  we  sball  briefly  consider  Miscellaneous  Correlation 
Coefficients  by  referring  for  the  most  part  to  some  specific  papers. 

These  coefficients  will  be  1.  The  Contingency  Coefficient,  2,  Yule's 
Coefficient  of  Association  and  Yule's  Coefficient  of  Colligation,  and 
3  .Thorndike's  Median  Ratio  Coefficient  of  Correlation. 

Part  D.  then  presents  Examples  to  the  aforementioned  correlation 
coefficients . 

Our  discussion  of  all  correlation  coefficients  will  be  very  brief, 
mostly  only  a  statement  of  the  assumptions  and  of  tie  basic  definition.  For 
standard  error  formulas  and  correction  formulas  one  1*  referred  to  the 
references . 

The  answer  to  the  question  what  correlation  coefficient  one  should 
apply  in  a  specific  situation  is  given  by  the  assumptions  of  the  single 
coefficients,  which  are  different  for  each  coefficient. 


A.  Kendall’s  General  r -correlation  Coefficient 

In  the  following  we  will  consider  the  definition  of  the  so-called 
Kendall's  general  T-correlation  coefficient  (Reference  3).  We  will 
state  the  necessary  assumptions,  the  definition,  and  then  we  will  derive 
three  correlations  coefficients  from  this  general  correlation  coefficient, 
namely  (1)  Kendall's  T-correlation  coefficient. 

(2)  Spearman's  p -correlation  coefficient. 

(3)  Pearson's  product -moment  correlation  coefficient  r. 


Assumptions:  A  sample  of  N  objects  (subjects,  Individuals, 


observations,  measurements)  is  considered  relative  to  two  properties 

(continuous  variables)  X  and  Y,  exhibiting  values  X 

and  Y^,...,Yjj  according  to  X  and  Y.  To  any  pair  of  individuals 

and  j  we  will  allot  an  X-score,  denoted  by  a,...  and  a  Y-score  b 

subject  to  a..  =  -a..,  b..  =  -b... 

ij  Di  ij  ji 


ij 


i 


P- 
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Definition  2.1:  Kendall's  general  r-correlation  coefficient  is 
defined  as 


5*  a.  .b.  , 
L  111" 


N  N 

l  a?.  I  b?, 
1,3=1  1,3=1 


with  a„  =  0  if  i  =  j. 

Now  let  us  adopt  three  special  methods  of  scoring  and  derive 
Kendall's  t -correlation  coefficient,  Spearman's  p-correlation  coefficient, 
and  Pearson's  product -moment  correlation  coefficient  r. 

1,  Kendall's  t-correlation  Coefficient 

Assumptions:  Suppose  the  values  X,...,Xg  are  ranks,  where  we 
adopt  the  following  definition  for  the  term  rank: 

Definition  2.2:  If  N  objects  are  arranged  in  order  according  to 
some  property,  which  they  all  possess  in  a  varying  degree,  the  objects 
are  said  to  be  ranked.  Each  object  has  a  rank,  expressed  as  a  natural 
number  between  1  and  N. 


Denote  them  by  pi>...,pN.  Correspondingly  denote  the  ranks  Y  ,...,Yu 
by  Consider  the  pair  of  individuals  i  and  j.  Choose  the 
following  scores: 


a. .  =  1,  if  p.  <  p.  b. .  =  1,  if  p.  <  p. 

13  *i  3  13  1  *3 

and 

a..  =  -1,  if  Pi>Pj  b..  =  -1,  if  p.>pr 
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Considering  then  the  denominator  in 


Considering  the  numerator. 


H  H  . 

•  l  l  (Pj  "  Pi><*j  -  qi> 

iil  3=1 


N  N 


N  H 

l  l  Pj'lj  +  I  X.  Piqi 

i=l  j=l  3  J  3=1 


j= 

N  N 


-  I  l  (Pi^j  +  PjV 
i=l  1=1 

N  N  H 

*  M  J,  pA' "  2  JL  P*  jJi’i 


*  ‘  <  ' 

V*  .  - 


f  ■  ■ 


=  2N 


j,  piqi  -  2  [I  «  *  Kl]’ 


2N  [  ? 

1-1 


N*  (1  +  N)2. 


Denote  by  S(d)  the  sum  of  the  differences  p£  -  q* 

" '  '  N  N  N 


S(d2) 


.  -  q.,  i  =  Then 

i  Hl 

N  w  £  r  2 

■  ^  <Pi  - 1!*2  =  £  pi  - 2  j,  Pi’p  *  j,  qi 

N  ■? 

2  l  P?  -  2  .1  P^i  • 

i=l  x=l 


and  therefore 
N 


I  Pi-i  ■  l  P|  -  ?  «’>  ' 

i=l  1  *  1=1 


We  therefore  obtain  for  the  numerator 
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II  IMZ 


N 


Remarks :  Depending  on  the  method  of  scoring  the  difference 
between  the  observations  i  and  j  for  one  variable,  one  obtains  from 
Kendall's  general  r -correlation  coefficient  the  T-,  p-,  and  r-coefficient . 
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The  scoring  for  t  is  therein  the  simplest  one,  assigning  a  1  or  -1 
to  this  difference ,  thus  not  looking  at  all  on  how  far  apart  the  two 
observations  are.  The  scoring  for  p  is  more  involved,  taking  into 
account  the  actual  difference  of  the  observations  by  way  of  their  ranking 
difference.  For  this  reason  p  can  be  considered  as  the  product -moment 
correlation  coefficient  between  ranks.  Scoring  for  r  takes  into 
account  all  the  information  by  way  of  the  actual  difference  between  the 
measurements. 

The  choice  of  either  one  of  the  coefficients  will  depend  on  the  . 
data  available .  If  actual  measurements  for  continuous  variables  are 
available  r  is  preferable  to  p  and  t.  If  only  data  in  the  form  of 
ranks  are  available,  p  is  preferable  to  r. 

Pearson's  product-moment  correlation  coefficient  is  the  most 
important  correlation  coefficient  for  factor  analysis,  since  its 
assumptions— rectilinearity  and  continuity  of  the  variables,  made  for 
the  derivation  of  this  coefficient — are  the  ones  which  are  mostly 
fulfilled  by  the  variables  involved  in  factor  analysis. 

B.  Correlation  Coefficients  for  Dichotomized  Variables 

1.  The  Biserial  Correlation  Coefficient 

Assumptions:  Let  X^  and  X^  be  two  variables.  Consider  one 
of  them,  say  X  ^ ,  as  dichotomous  (or  being  reduced  to  dichotomy)  under 
the  assumption  though,  that  it  is  really  continuous,  while  we  have  only 
categorical  information .  Assume  further  that  the  dichotomized  variable 
has  a  normal  distribution,  that  the  whole  sample  distribution  is  present, 
and  that  the  two  tails  of  the  distribution  fit  together  into  a  whole 
normal  distribution.  Looking  only  upon  the  two  tails  would  make  the 
coefficient,  which  will  now  be  defined,  too  high.  Consider  the  second 
variable  X^  as  having  quantitative  scores,  no  assumption  made  about  its 
distribution.  Assume  a  sample  size  of  at  least  SO. 

Denote  the  two  categories  of  X.  by  X.  and  X.  .  Let  N  be 

1  3  11  32 

the  total  number  of  individuals,  the  sum  of  the  number  of  individuals 

N,  for  X.  and  of  the  number  of  individuals  N  for  X.  . 

1  31  2  32 
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Definition  2.3:  The  biserial  correlation  coefficient  is  defined 


as 


where  the  following  notation  is  adopted  (Reference  4): 

Mp  =  the  meanscore  on  X^  of  the  individuals  in  category 
X^  or  X^2,  whichever  is  the  larger 

=  the  meanscore  on  X^  of  the  individuals  in  X^ 

and  X .  together 
32 

the  standard  deviation  of  X.  for  the  entire  distribution 

3 

H1  M2 

the  proportion  jj—  or  jj—  ,  whichever  is  corresponding 
to  the  category  with  the  higher  mean  on 

Y  =  the  ordinate  at  the  point  of  truncation  of  the  normal 
distribution 

Remarks; 

a.  If  the  dichotomized  variable  cannot  be  assumed  to  be 
continuous  and  normally  distributed,  Richardson  and  Stalnaker  (Reference  5) 
suggest  another  form  of  the  biserial  correlation  coefficient. 

b.  If  one  wants  to  look  only  upon  the  two  tails  of  the  distribution, 
which  is  often  wanted  in  educational  and  sociological  research,  in 

other  words,  if  one  wants  to  look  upon  so-called  "widespread  classes", 
Peters  and  Van  Voorhis  (Reference  6)  suggest  a  "biserial  correlation 
coefficient  from  widespread  classes". 

c.  Pearson  (Reference  7)  suggests  a  coefficient,  called  biserial 
eta,  based  on  the  assumption  that  one  variable  is  given  by  alternative 
and  the  other  by  multiple  categories. 
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2. 


The  ^-coefficient 

(Other  name:  Four-point  Coefficient) 


Assumptions:  The  two  variables  under  consideration  have  to  be 
truly  dichotomous.  Let  X^  and  X^  be  two  variables  with  categories 
X.^,  X.^  and  X^,  X^  respectively.  Then  establish  the  following 
table  of  frequencies  a,  b,  c,  d.  Let  the  four  cells  be  consistent 


with  the  quadrants  of  a  coordinate  system,  represented  by  the  signs . 


Example 


e.g.  a  =  number  of  employed  women 
b  =  number  of  employed  men 

N  =  number  of  women  and  men,  employed  or  unemployed . 
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Definition  2.4:  The  ^-coefficient  is  defined  as 


a.  If  we  assume  X.  and  X.  to  be  dichotomous,  while  they  are 
1  * 

actually  continuous,  the  ^-coefficient  is  to  be  considered  as  an  estimate 
of  Pearson's  r.  In  order  to  obtain  a  good  estimate  of  r  a  table 
(Reference  8)  is  available  which  gives  a  value  k,  by  which  $  has  to 
be  divided. 


In  general  divided  by  k  corresponds  very  closely  to 
tetrachoric  r  (the  correlation  coefficient  which  is  customarily 
applied  to  dichotomized,  but  really  continuous  data).  So,  if  computing 
diagrams  for  tetrachoric  r  are  not  available,  might  be  the  most 
approximate  measure  for  tetrachoric  r. 

b.  In  order  to  cut  out  the  influence  of  extreme  values,  which  go 
into  the  computation  of  the  ^-coefficient,  originating  from  extreme  cuts 
in  the  distribution,  $  is  better  divided  by  the  maximum  possible  value 
consistent  with  the  given  marginal  values,  $  max.  $  divided  by 
$  max  is  probably  the  best  correlation  coefficient  in  use  for 
dichotomized  variables.* 


3.  The  Tetrachoric  Correlation  Coefficient 


Assumptions:  Let  X.  and  X^  be  the  two  variables  under 
consideration.  Assume  that  the  data  for  both  variables  ore  in  terms  of 
dichotomies,  but  that  both  variables  are  really  continuous  and  normal 
in  distribution. 


Definition  of  the  Coefficient:  The  statistical  considerations 
necessary  for  the  derivation  of  the  tetrachoric  correlation  coefficient 
are  extensive.  We  will  state  here  two  of  the  formulas,  used  to  compute 
the  coefficient. 

Again  denote  the  two  categories  of  X^  by  X^,  X^  and  the 
categories  of  X^  by  X^,  X^.  a»  b»  c»  d  ate  notations  for 
frequencies. 

4  E.  E.  Cureton.  Note  on  $/<ji  max.  Psych ometrika,  1959,  24,  p.89. 
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** 

"Kiel  +Xk2 

+  x., 

X.  31 

3 

~Xj2 

a 

b 

c 

d 

The  statistical  derivation  terminates  in  a  formula  involving 
double  integration,  which  can  be  solved  for  the  tetrachoric  correlation 
coefficient  r,  yielding  a  very  complicated  formula  for  r  (Reference  9). 

In  putting  the  restriction  upon  the  problem  of  cutting  the 
distributions  at  the  mean,  the  following  formula  for  the  tetrachoric 
correlation  coefficient  can  be  arrived  at: 

r  =  sin  2tt  ,  N  =  a+b  +  c  +  d. 

N* 

The  assumption  of  equal  dichotomies  might  be  a  crude  one  for  certain 
problems.  So,  Pearson  develops  (Reference  9)  empirical  formulas  that 
give  approximately  correct  r’s,  the  mean  error  in  15  trials  being 
less  than  4  per  cent.  The  simplest  of  these  approximate  formulas  is 
the  following  one 

/  /id  \ 

r  =  COS  TT  - 1 

\  ^ad"  +  fbc  I 

where  no  restriction  is  put  on  the  point  of  dichotomy.  H.  W.  Eber 
(Reference  10)  uses  this  formula  for  computing  a  correlation  matrix 
for  3,000  variables. 

Remarks 

a.  In  order  to  facilitate  the  labor  involved  in  computing 
tetrachoric  correlation  coefficients,  Chesire,  Saffir,  and  Thurstone 
(Reference  11)  prepared  a  set  of  computing  diagrams.  These  diagrams 
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are  advisable  to  be  used  whenever  the  coefficient  is  not  required  to 
be  of  high  accuracy.  Other  diagrams  are  designed  by  Hamilton 
(Reference  12). 

b.  As  for  the  biserial  correlation  coefficient  Peters  and 

Von  Voorhis  (Reference  6)  develop  a  tetrachoric  correlation  coefficient 
from  widespread  classes. 

c.  The  tetrachoric  correlation  coefficient  is  one  of  the 
coefficients  for  factor  analysis  more  often  used  besides  the  product - 
moment  coefficients.  To  use  this  coefficient  is  thereby  a  necessary 
condition,  if  the  data  are  reported  in  dichotomies  only.  If. the 
dichotomies  are  derived,  though,  by  cutting  continuous  data  at  some 
point,  it  should  be  strongly  considered  to  employ  product -moment 
coefficients  instead,  since  the  tetrachoric  correlation  technique  loses 
some  of  the  available  information. 

C.  Miscellaneous  Correlation  Coefficients 

1.  The  Contingency  Coefficient 

The  contingency  coefficient  is  applied  when  variables  X..  and 
X^  both  can  be  classified  into  two  or  more  categories,  and  when  these 
categories  are  not  quantitative  but  qualitative.  The  formula  of  the 
contingency  coefficient  makes  use  of  the  chi-square  statistic. 

Definition  2.5:  The  contingency  coefficient  is  defined  as 

C 

Under  certain  conditions  C  is  equivalent  to  Pearson's  product- 
moment  correlation  coefficient.  If  the  variables  are  continuous, 
correction  formulas  exist,  see  References  13,  14,  and  15. 

2.  Yule's  Coefficient  of  Association  and  Yule's  Coefficient 


In  connection  with  the  ^-correlation  coefficient  Yule  (Reference  16) 
considers  two  correlation  coefficients,  based  on  a  four-fold  table- 
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Definition  2.6: 


of  association  is  defined  as 


of  colligation  is  defined  as 


/Ec"  -  /ad" 

«  =  - 

/ad  +  /E? 

The  coefficient  is  equal  to  <j>,  if  the  four- fold  table  is  "equalized", 
tftat  is 


Vaff 

.'Ec" 

/EcT 

/Sd 

3.  Thorndike's  Median  Ratio  Coefficient  of  Correlation 

Thorndike  (References  17  and  18)  developed  a  correlation  coefficient 
which,  under  certain  conditions  (Kelley,  Reference  13)  .is  equal  to  the 
product -moment  correlation  coefficient. 

Let  the  variables  x.  and  x,  be  deviates  from  the  mean  and  let 

1  k  ■ 

a.  and  o,  be  corresponding  standard  deviations.  Supposing  the  relation 
3  * 

of  variables  x.  and  x,  to  be  rectilinear  the  coefficient  of  correla- 
3  k 

tion^  defined  as  follows ,  represents  an  inference  about  the  general  drift 
of  the  relation. 

Definition  2,8: 

Thorndike's  median  ratio  coefficient  of  correlation  is  defined  as 

r  =  median  of  the  2N  ratios 

x  .i/a. 

_ I 

W0* 


Yule's  coefficient 


Q  = 


be  -  ad 


ad  +  be 

Definition  2.7:  Yule 's  coefficient 
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and 


Before  computing  the  coefficient  on  the  basis  of  the  data  in  Table  1, 
we  have  to  consider  briefly  how  to  take  care  of  ties  in  the  data.  Let 
t(u)  be  the  number  of  equally  ranked  individuals,  then  there  are 
•i  t(t-l)  pairs  to  take  care  of. 


Denote  by 


T  =  i  l  t(t-l) 
t 


o  =  4  l  u(u-l), 
t 


where  £  means  summation  over  all  sets  of  ties. 
Then  t  t  is  computed  as 


This  is  the  appropriate  form  of  x  if  ties  arise  in  the  data.  The 
computation  will  be  clear  from  the  example.  The  formula  is  stated 
and  discussed  by  Kendall  (Reference  3). 
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Table  1 


Measurements  for  Two  Variables — Weight  and  Height  on  10  Individuals* 

d  =  rank 


Individual 

Ranks 

Weight  =  X 

Ranks 

l 

Height  =  X 

2 

difference 

d2 

A 

6 

165.00 

a 

177.80 

-2 

4 

3 

1 

189.50 

1 

187.60 

0 

0 

C 

10 

128.00 

10 

169.00 

0 

0 

0 

9 

144.00 

4.5** 

181.50 

4.5 

20.25 

c 

7 

156.50 

7 

179.70 

0 

0 

F 

8 

145.50 

9 

172.90 

-1 

1 

G 

5 

166.00 

4.5** 

181.50 

0.5 

0.25 

H 

3 

178.00 

2 

185.30 

1 

1 

I 

2 

182.50 

6 

181.00 

-4 

16 

J 

4 

167.50 

3 

182.35 

1 

1 

*  The  measurements  for  the  10  individuals  were  picked  randomly  from  a 
set  of  measurements  for  130  individuals. 

**  Individuals  D  and  G  are  tied  for  ranks  4  and  5.  It  is  common 
use  to  rank  each  individual  by  the  average  of  the  tied  ranks. 
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To  find  S  we  have  to  compare  each  individual  i  with  each  individ¬ 
ual  j.  We  score 


aij  =  if  pi  <  pj 

aij  =  -1.  if  Pi  >  Pj 

aij  =  °*  if  pi  =  pj  ‘ 

Listing  the  results,  also  for  by  ,  we  obtain: 

Individual  i  Compared  Scores  Individual  i  Compared 
with  Individual  j  Multiplied  with  individual  j  where 

i,  j  =  for  Xj  i,j  =  for  X? 
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D  with  E 
F 
G 
H 
X 
J 

E  with  F 
G 
H 
1 
J 

F  with  G 
H 
I 
J 

G  with  H 
1 
J 

H  with  I 
J 

X  with  J 


Individual  i  Compared 
with  Individual  j 

i,j  =  for 


Scores  Individual  i 
Multiplied  with  individual 

i,j  = 


Compared 
j  where 
for  X2 


Then 


S  =  (Sum  of  (+l)-scores)  -  (sum  of  (-l)-scores) 
S  =  36  -  8  =  28 


It  is 

T  =  0,  there  are  no  ties  in  Xj 
U  =  A  (2*1)  -  1  >  there  is  one  tie  in  ^ 

Then  we  obtain 


_  .  29— 

y*45-44 


=  0.629 
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2,  Example  for  Spearman's  p-Correlatlon  Coefficient 
The  coefficient  is 


6s(d2) 

P  1  n(n^ -l) 

Also  in  Spearman's  p-c»rrelation  coefficient  we  have  to  take  care  of 
ties  in  the  ranking  of  the  two  correlated  variables.  Typify  the  ties 
by  t  and  u  and  define 


T  =  -rsj  t(t2-t) 
t 

U  =  u(u2-u)  . 

Then  Kendall  obtains  two  equations,  deducing  them  from  the  general  T- 
coefficient: 

p  =  1-  6(S-(d-)2  l-T  (1) 

N(N2  -  1) 


P  = 


i  H(N2-1)  -S(  d2 )  -  T  -  0 

D 

#  N(t)2-l)-2»"j  N(N2-l)-?uJ 


(2) 


For  our  data  of  Table  1  we  obtain 


T  =  0 

U  =  2(22-2)  =  |  . 
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Then  for  Equation  1 


6(43-5  *  |) 

990 


=  0.734  . 

Equation  2  yields 


=  0.736 


Example  for  Pearson's  Product-Moment  Correlation  Coefficient 


We  shall  use  the  following  form  of  the  coefficient 


W 


We  again  will  use  the  data  from  Table  Is 
Therefore 

X  =  X  ,  Y  =  X  ,  N  =  10  . 
1  2 


Then  one  computes 


10  X  .  X  . 

I  "^Tn  =  29262.37 
j=l  i0 


lw  A.  . 

X,  =  [  =  162.25,  X2  =  26325.06 

1  j  =  l  i0  1 


We  shall  use  the  coefficient  to  determine  the  relation  between  the 
variable  Xj  =  size  of  family  and  variable  X^  =  tendency  of  children 
to  leave  school  before  the  age  of  eighteen.  The  data  of  X2  ar;  given 
by  the  two  categories:  X  =  children,  who  remained  in  school  according 

to  the  size  of  family,  X ^  =  children  who  left  school  according  to  the 
size  of  family.  The  data  are  laid  out  in  Tablo  2. 
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Data  for  an  Example  of  the  Biserial  Correlation  Coefficient 


Size  of  Family 

X1 

(Class  Harks  X^) 

Children  remaining 

in  School  X 

21 

(Frequencies  f^.) 

Children  left 

school  X  _ 

22 

(Frequencies  f^.) 

Total 

(flj 
+  f2j.J 

12 

2 

2 

11 

4 

3 

7 

10 

4 

2 

6 

9 

4 

8 

12 

8 

20 

3 

23 

7 

10 

17 

27 

6 

24 

12 

36 

5 

18 

18 

36 

4 

30 

10 

40 

3 

34 

12 

46 

2 

34 

10 

44 

1 

16 

5 

21 

200 

100 

300 

The  example  is  from  Reference  6  and  the  measures  are  from 
Reference  19. 

Let  us  first  compute  the  meanscore  on  X^  in  categories  X  , 

X  ,  and  X  ,+  X  .  Using  the  mean  formula  for  grouped  data: 

22  21  22 


x  -  p  fnXi  .  2-i2  +  4-n  +•••  +  i6*i  .  914  ,  , 

21  '  ^  200  "  200  '  200  " 


-  .  V2  f2jXj  3’  11  +  2*10  +•••  +  5*1  531 

*22  '  100  '  100 


100 


=  5.31 


12  (f.  +  f „.)X_.  0. 


—  V—  ,  y  Li-  2j  J  -  2-12  +  7-11  t  •••  t21*  1  .  1445  . 

1  22  300  300  '  300  '  4,82 
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Such  that 


H  ~  5.31 
P 

Mt  =  4.82 


How  we  compute  the  standard  deviation  of  X  for  X  .  + 

2  21 

again  the  formula  for  grouped  data: 


22 


. .  He  use 


12 


I  (ftj  ♦  faj)(Xj  -  x2l  ♦  x22) 


°t = 


iii. 


300 


2(12  -  4.82)2  +  7(11  -  4.82)2  +  •  .  .+  21(1  -  4.82)2 
300 


=  2.57 


and 


J,  =  m  3  0  33 
N  300  0-33 


Then  y  =  0.3635  ,  as  taken  from  a  table  by  Peters  and  Van  Voorhis 
(Reference  6). 


He  now  compute  r  as: 


r  = 


0.175 
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Example  for  the  4- coefficient 


The  $- coefficient  is  computed  from  a  four-fold  tallies  as  are  the 
tetrachoric  correlation  coefficient  and  Yule’s  two  coefficients,  the 
contingency  coefficient  can  be  computed  from  a  four-fold  or  a  manifold 
table.  We  will  now  demonstrate  ail  above  mentioned  coefficients  from 
an  identical  four-fold  table,  which  will  only  be  interpreted  differently 
for  the  single  coefficient  under  consideration,  in  order  to  allow  for 
the  special  assumptions  of  this  coefficient. 

The  4-coefficient 


be  -  ad 


we  will  lay  out  the  data  in  Table  3. 


Table  3.  Data  for  an  Example  of  the  ^-coefficient 


An  interpretation  is  given:  We  want  to  determine  the  relationship  of 

employment  status  (X,)  and  sex  classification  (X  ),  where  both  X  and 

1  2  1 

X2  are  given  by  two  categories  X^  =  being  employed,  X  =  being 
unemployed,  =  women,  X  men.  So,  e.g.,  a  =  665  represents  the 

number  of  women,  questioned  in  a  sample  of  3000  men  and  women  (lauG  women, 
1054  men),  who  were  employed  out  of  the  1946  women  in  the  sample.  Note, 
that  both  variables,  sex  and  employment,  are  truely  dichotomous. 

We  obtain 
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849  •  1281 


665  •  205 


/l5 14  "•  1486  •  1054  •  1946 

=  <1443  . 

That  is, the  relation  between  sex  classification  and  employment  status  is 
a  positive  one.  That  means,  for  the  data  under  consideration  being  a 
man  and  being  employed  are  positively  related. 

Connected  to  the  ^-coefficient  are  Yule's  two  coefficients.  Their 
computation  from  the  data  in  Table  3  gives  the  following  results: 

Yule's  Coefficient  of  Association : 


Q  = 


be  -  ad 
ad  +  be 


849  «  1281  -  665  «  205 
849  •  1281  +  665  •  205 


=  0.776  . 


Yule's  Coefficient  of  Colligation: 

/5c  -  /ad  /849  ■  1281  -  /665  "•  205 

<i>  =  - ’  =  . — . —  1  ■  — — — • — ■ 

/ad  f  /5c  /849  •  I28T  +  /665  *  205 

=  0.477  . 


6.  Example  for  the  Tetrachoric  Correlation  Coefficient 
We  will  use  the  cosine-formula  of  the  coefficient 


/ud _ 

/ad  +  /5c 
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and  apply  it  to  the  data  of  Table  3.  As  an  interpretation  of  the  four¬ 
fold  table,  consider  the  case  that  we  have  a  sample  of  3000  teachers 
divided  into  successful  and  unsuccessful  ones  and  that  we  have  infor¬ 
mation  about  how  many  of  the  successful  and  how  many  of  the  unsuccessful 
teachers  have  taken  courses  in  pedagogy  beyond  6  hours  or  less  6  hours. 
We  want  to  know  the  relationship  of  teacher  success  and  taking  courses 
in  pedagogy. 


We  set 


=  teacher  success — X^  =  successful 

X  =  unsuccessful 
12 

X^  =  courses  in  pedagogy— X  =  beyond  6  hours 

X22  =  less  than  6  hours. 


Note,  that  one  can  think  of  both  variables  as  being  continuous,  though 
they  are  represented  as  dichotomous. 


=  0.6811 


The  tetrachoric  correlation  coefficient  computed  from  the  tables  of 
Pearson  and  his  students  has  the  value  r  =  0.6633  for  the  above  data 
Chesire,  Saffir  and  Thurstone  compute  a  value  of  r  -  0.6638  for 
the  considered  data  by  their  computing  diagrams. 


Example  for  the  Contingency  Coefficient 


The  coefficient 


c  r 

Vl!  +  x2 

could  be  applied  to  determine  relationship  between  two  variables,  each 
described  in  more  than  two  categories.  Let  the  variables,  for  example, 
be  eye  color  of  fathers  (Xj)  and  eye  color  o'  sons  (X2)  .  Each 
variable  may  be  divided  in  many  categories:  Xjj  and  X21  =  brown 

X12  and  X22  =  grey,  XJ3  and  X23  =  blue  and  so  forth.  He  will  apply 
the  coefficient,  though,  to  data  reported  in  a  four-fold  table,  thereby 
assuming  that  each  variable  above  has  only  two  categories.  We  will  use 
the  data  reported  in  Table  3. 

We  will  make  use  of  a  simple  computing  formula  for  x2  for  the  case 
of  a  four- fold  t-.jle: 


2  _  N(ad  -  be)2 _ _ 

x  “  (a  +  b)  (c  +  d)  (a  t  c)  (b  +  dl)  * 

proved  for  example  in  Reference  20. 


We  obtain 


r  N(ad-bc)2  ~7  .  N(ad-bc)2 

*  (a+b)(c+d)(s+c)(b+d)  /  I  ( a+b )  (  c+d )  ( a+c )  (b+  d) 

/  H(ad-bc)2 

N(a+b)(c+d)(a+c)(b+d)  t  H(ad-bc)2 


_ -849  •  1281  t  665  »  205 

/l514  •  i486  •  1054  .  1946 


0.405  . 
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8 


Example  for  Thorndike's  Median  Ratio  Coefficient  of  Correlation 


r  =  median  of  the  2N  ratios 


and 


He  want  to  apply  the  coefficient  to  the  data  of  Table  1. 

Then 

=  weight 

x2  =  height,  and  N  =  10 

He  first  have  to  set  x^,  x2.,  i  =  1,...,10,  as  deviates  from  the  respec¬ 
tive  means.  The  means  are  x1  =  162.25  and  x2  =  179.87.  Subtracting  Xj 

from  all  x^,  i  =  1,...,10  and  x  from  x^,  i  =  1 . 10,  we  can 

compute  the  standard  deviations  as 


Xii/qi 


,  i  =  1, 


to  get 

Oj  =  18.08 

<?2  =  5.22  . 


43 


x„,/o.  ,  i-=  are  formed 

21  2 


The  ratios  y  =  x^/cTj  and  z 
next.  And  then  y/z  and  z/y  are  considered.  Their  median  furnishes 
Thorndike's  correlation  coefficient.  It  is  computed  as  r  =  0.872  from 
the  data  of  Table  1.  Note  its  rather  good  agreement  with  Pearson's 
product  moment  r  =  0.859. 


2.4  GEOMETRIC  INTERPRETATION  OF  PEARSON'S  PRODUCT-MOMENT  CORRELATION 

COEFFICIENT 

In  this  section  we  want  to  interpret  the  correlation  coefficient 
(if  we  talk  about  the  correlation  coefficient,  we  mean  Pearson's  product- 
moment  correlation  coefficient)  in  view  of  its  geometric  aspects  with 
respect  to  the  factor  model. 

Let  us  then  assume  an  N-dimensional  Euclidean  space  with  a  rectan¬ 
gular  Cartesian  Coordinate  System,  whose  origin  is  denoted  by  0:(0,...,0) 
and  whose  unit  points  are  denoted  by  :(1,0  ,. . .  ,0) ,.. .  .E^tCO,. . .  ,0,1). 
Let  us  interpret  the  n  variables  Z^  as  points  represented  in  this 
system,  the  points  and  their  coordinates  denoted  by 

Zj  :  (Zjj ,  ,2^)  =  Z..:(Zj.).  Such  a  representation  for  each  of  the  n 
variables  can  be  called  a  vector  representation,  each  Z^  being  named 
a  vector.  Let,  further  on,  the  N  lines  OZ^.  (i=l,...,N),  each  passing 
through  the  origin  and  one  of  the  unit  points  be  called  coordinate  axes. 

Now  let  us  make  the  following  definitions: 


Definition  2.9:  For  any  two  points  Z.  sCZ^ ,. . .  ,Z ^  and 

V(zki  .  ~ 


.  jZj^)  their  distance  is  defined  by 


d(w  -  VT  (*jt  -  V2  • 


If  the  distance  of  a  point  Z^  from  the  zero  point  is  considered,  it  is 
called  the  norm  denoted  by 


D(0Z.)  =  D( Zj ) 
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Definition  2.10:  Let  the  norm  D(OZ^)  be  denoted  by  p^. 


Then  the 


angles ,  which  the  line  OZ^  makes  with  the  axes ,  denoted  by 

9..  =  iZ.OZ..,  are  called  the  direction  angles  of  the  line  and  their 

jx  r  j  . — - 

cosines  are  called  the  direction  cosines,  denoted  by  =  cos  8^. 

From  the  definition  it  follows  that  X^  =  cos  8^  =  Z^./p^.i-l,...  ,N. 

Now  the  following  interesting  implications  can  be  made. 


From 


p^  =  D(0Zj )  = 


4 


l  Z2, 
31 


follows 


Since 


it  follows 


■*i  ■  Jx 


Z?. 

X?.  =  cos2  ,  i=l,...^| 

3  J1  p? 

3 


N  N 

y  x?.  =  t 

3i  .  “ 


i=l 


cos' 


i=l 


N  Z2. 

8..  =  l  -ii 

31  i=l  p2 


N 

l  Z?. 
1  =  1  31 
N 

l  Z2. 
i=l  31 


=  1 


That  is,  the  sum  of  the  squares  of  the  direction  cosines  of  a  line  in 
N-space  is  equal  to  unity. 

b.  Next,  denote  by  and  the  direction  cosines  of  the 

vectors  Zj  and  Z^, 
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Then 


Z.. 

A..  =  Ji 

31  Pj 


and 


*kl 


ki 


1=1,...,  M. 


We  are  now  interested  in  the  angle  of  separation  of  two  lines  in  N-space, 
precisely  said,  in  the  cosine  of  this  angle.  We  can  derive  an  appropriate 
formula  by  using  the  direction  cosine  formulas  and  by  referring  to  the 
trigonometric  properties  of  a  triangle  in  the  plane,  visualizing  that,  if 
two  lines  meet  in  a  point,  a  plane  can  be  drawn  through  the  point  contain¬ 
ing  the  two  lines.  If  the  two  lines  do  not  meet  in  a  point,  a  line  can  be 
drawn  parallel  to  one  of  them,  so  that  the  line  and  the  parallel  form  the 
angle  we  are  interested  in. 

Denote  the  vertex  of  the  angle  by  P:(p.),  the  angle  by  ,  and 

1  JK 

distances  as  follows:  DiPZ^)  =  a,  DiPZ^  =  b  and  D<Z..Zk)  *  d.  Then  we 
can  draw  the  following  picture: 


Zji  =  Pi  +  aXji>  i-l>.  • . ,N, 
^i  s  ?i  +  bXki*  i=1* •••.»• 


46 


Then,  applying  the  law  of  cosines,  we  obtain 


d2  =  a2  t  b2  -  2ab  cos' 


(3) 


Applying  the  distance  formula  we  obtain 


d2  =  (DCZ.Zj.))2  =  l  (Z..  -  Z.  .)J 
]  31  Ki 


=  l  [(p.  +  aX  ..)  -  (p.  +  bX..)]2 
■  1  31  *i  ki 

i=l  J 


1  f  X?.  +  b2  7  X2  -  2ab  [  X..X.  . 

31  ,L.  ki  31  ki 

1=1  J  1=1  1=1  J 


=  a2  +  b2  -  2ab  \  x-jiXKi  ' 


i=l 


(4) 


This  implies  by  identification  of  terms  in  Equations  3  and  4: 

N 

C°S  *3k  =  Jx  XjAi  * 

That  is,  the  cosine  of  the  angle  of  separation  of  two  lines  is  given  by 
the  inner  product  of  corresponding  direction  cosine  vectors  (X^  ,. . .  , X } ) 

311(1  . XkW*’ 


c.  Since 


Z.. 

X..  =  — ^ —  ,  1=1, . . .  ,N 

31  Pj 


"ki 


ki 


,  1=1 ,  •  •  •  ,W  , 
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we  obtain 


It  is 


since 


H 

cos  =  l 
3K  i=i 


XjiAki 


N 

•  l 


i=l  pjpk 


H 

A  1‘ 


=  standard  deviation 
=  1,  for  standardized  variables  ty 


Thus 


N 


7.  .,Z., 
]i  ki 


cos  6.,  =  y 

vik  . p  .p. 

J  i=l  r]Hk 


S  Z..Z.. 

X  N  =  rjk  *  k=1,"*,n' 

i=l  J 


These  considerations  yield  the  result,  that  the  coefficient  of  correla¬ 
tion  between  two  standardized  variables  is  the  cosine  of  the  angle 
between  their  vectors  in  M-space. 

d.  Our  geometric  interpretation  of  a  correlation  coefficient,  so 
far,  started  with  the  consideration  of  the  given  raw  data,  namely  the 
n  points  Z^:(Z^.)  in  N-space.  Then  the  cosine  of  the  angle  between 
two  such  vectors  in  N-space  constitutes  the  coefficient  of  correlation 
between  two  variables. 
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Now  we  assume  a  correlation  matrix  R,  computed  from  the  raw  data, 
and  a  factorization  of  this  correlation  matrix.  According  to  the  mathe¬ 
matical  model  underlying  factor  analysis  each  variable  Z^  is  now 
expressible  by 

Z\  =  a.,F,  +  a.  F  +...+a,  F  +  a.U.,  j  =  l,...,n, 

3  111  32  2  3m  m  3  3  * 

where  the  loadings  a .j(i=l,. . . ,m)  were  obtained  from  the  factorization. 

In  this  representation  the  n  vectors  Z.  are  considered  in  the  space 
of  m  common  factors  and  n  unique  factors ,  the  total- factor  space. 

The  vector  representation  of  Z^(  j=l,.,  .  ,n)  in  this  space  is  denoted  by 

Zj  ;(ajj »  a  j2»"  *  *  ajm’  °»'*‘»  °>  aj»  °»***»  o3»  a^(i=l,. . .  ,m)  denoting 

the  coordinates  of  Zj  with  respect  to  the  common- factor  axes,  the  0  and 
a j  denoting  the  coordinates  of  Zj  with  respect  to  the  unique-factor 
axes.  We  now  assume,  that  the  system  of  common-  and  unique-factor  axes  is 
rectangular,  that  is,  all  factors  are  mutually  orthogonal.  Then  the  angle 
of  separation  of  two  vectors  Zj  and  Z£,  represented  in  this  system,  is, 
according  to  the  formula,  discussed  in  (b) 

m+n  m  a . .  a.  . 

c°s  <t>jk  =  l  =  l  ,j,k=l,...,n. 

3k  i=1  31  *1  PjPk 


3.  =  J  l  a?.  +  a? 

3  Vi=i  31  3 


=  1,  for  j=l . . 


(Since 


it  follows  that 


1U 

l  a?.  +  a?  =  total  variance) 
i=l  3  3 


^jk  =  ajiaki  =  rjk 
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Thus  we  obtain  that  the  reproduced  correlation  coefficient  (from  the 
pattern  of  loadings)  of  any  two  variables  ZJ,  Z£  is  equal  to  the  cosine 
of  the  angle  between  their  vectors  iri  the  total- factor  space. 

e.  Our  last  consideration  informed  us  about  how  a  correlation 
coefficient  is  described  if  the  variaoles  are  assumed  to  be  represented 
in  the  total-factor  space. 

finally,  in  factor  analysis  one  usually  does  not  consider  the  total- 
factor  space  but  the  space  of  m  common  factors  only,  that  is  *  one  regards 
the  n  vectors  contained  in  an  n-dimensional  space,  determined  by  the  m 
factors.  To  obtain  this  m-dimensional  space  one  considers  the  orthogonal 
projections  of  the  n  vectors  from  the  total-factor  space  into  the  common- 
‘•nctor  s?*co  of  m  dimensions  arid  defines  these  orthogonal  projections  to  be 
the  vectors  representing  the  variables  in  this  space,  denoted  by 


zr(aji’  ajz . V*  3=1. •••»"• 

We  assume  a  rectangular  coordinate  system  to  be  set  up  in  the  common- 
factor  space. 

Considering  now  the  angle  of  separation  of  two  vectors  ZV  and  ZjJ, 
represented  in  this  space,  one  obtains: 


m  a . .  a,  . 
]i  ki 


cos  l(>jk  =  J  VjVki  =  l  r.Q1  * 

i=l  i=l  j  k 


where 


(with  h?  -  communality) , 


*>0 


so  that 


r'lk,  j,  k=l,...,n. 

Hence  the  cosine  of  the  angle  between  two  vectors  which  represent  variables 
in  the  common-factor  space  is  equal  to  the  reproduced  correlation  coeffi¬ 
cient,  divided  by  the  product  of  the  square  roots  of  the  communalities  of 
these  two  variables.  We  may  call  the  obtained  correlation  coefficients 
ri'k  "the  correlation  coefficient  between  Z'j  and  corrected  for  unique¬ 
ness",  since  only  if  the  two  variables  do  not  have  any  unique  variance 
would  rl'k  be  equal  to  the  reproduced  correlation  coefficient  r!^. 

2.5  SIGNIFICANCE  AND  RELIABILITY  OF  PEARSON'S  PRODUCT-MOMENT  CORRELATION 
, COEFFICIENT 

A  statistical  consideration  that  can  be  made  on  a  Pearson  product — 
moment  correlation  coefficient  r^  is  the  determination  of  its  statistical 
significance.  Since  statistical  significance  of  r ^  is  dependent  on 
sample  size  N,  the  following  considerations  will  give  us  some  important 
information  about  this  dependence  which  we  shall  utelize  even  more  at  a 
later  stage. 

Let  us  first  briefly  consider  what  is  meant  by  statistical  signifi¬ 
cance.  In  statistical  considerations  mostly  only  sample  information  is 
available,  on  the  basis  of  which  one  tries  to  make  decisions  ahout  the 
population,  from  which  the  sample  was  drawn.  The  decisions  are  called 
statistical  decisions.  In  attempting  to  reach  decisions,  one  then  makes 
assumptions  about  the  population  involved.  These  assumptions,  which  may 
or  may  not  be  true,  are  called  statistical  hypothesis.  They  mostly  are 
statements  about  the  probability  distribution  of  the  population  in 
question.  If  we  assume  a  certain  hypothesis  to  be  true  and  then  find 
that  results  observed  in  a  random  sample  differ  markedly  from  those, 
which  we  expected  under  the  hypothesis  on  the  basis  of  pure  chance 
using  sampling  theory,  we  would  say  that  the  observed  differences  are 
significant.  We  would  then  reject  the  hypothesis.  Procedures  which 
make  it  possible  to  decide  whether  to  accept  or  reject  a  hypothesis  or 
to  determine  whether  observed  samples  differ  significantly  from  expected 


and  from  (d): 


cos 


-  !k_ 

♦jk  h.hk 
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results  are  called  tests  of  hypothesis  or  tests  of  significance.  When  one 
tests  a  hypothesis,  the  maximum  probability  with  which  one  is  willing  to 
risk  the  error  of  rejecting  a  hypothesis  when  it  should  be  accepted  is 
called  the  level  of  significance  of  the  test.  Usually  a  5%  level  of 
significance  is  chosen,  that  means  we  are  95%  confident ,  that  we  have 
made  the  right  decision  in  accepting  the  hypothesis.  If  we  now  consider 
a  sample  statistic  S  and  if  the  sampling  of  S  is  approximately  normal , 
then  we  can  be  confident  of  finding  the  mean  Ug  of  the  sampling  'distri¬ 
bution  of  S  in  the  interval  S.  -  2<Jg  to  S  +  2<jg  95.45%  of  the  time  or 
in  the  interval  3  -  1.96cjg  to  S  t  1.960g  95%  of  the  time.  These 
intervals  are  called  confidence  intervals.  The  end  numbers  of  these 
intervals  S  *  1.960g  are  called  confidence  limits. 

We  can  now  proceed  considering  the  statistical  significance  of  a 

correlation  coefficient.  A  correlation  coefficient  r..  computed  from 

the  measurements  on  variables  Z.  and  Z,  can  be  considered  as  an  esti- 

I  * 

mate  of  the  true  population  correlation  coefficient,  denoted  by  Pj^*. 

The  measurements  on  Z ^  and  5^,  taken  as  pairs  (Z^,  Zj^),  i=l,...,H, 
are  considered  a  sample  from  the  population  of  all  possible  such  pairs. 

Since  two  variables  are  involved,  the  population  is  called  bivariate. 

We  assume  that  it  has  a  bivariate  normal  distribution.  We  are  interested 
in  whether  the  observed  correlation  coefficient  differs  significantly  from 
an  expected  result.  This  obviously  depends  on  the  sample  size  N.  The 
larger  N  will  be  the  better  will  be  the  estimate  of  the  true  population 
coefficient  by  the  sample  coefficient.  Then  a  statement  about  the  error 
or  precision  of  the  estimate  is  called  its  reliability.  In  order  to  find 
out  about  statistical  significance  of  r^j  we  have  to  test  two  hypotheses, 
namely  that  is  zero  or  is  not  zero.  To  be  able  to  test  these  hypoth¬ 

eses,  we  have  to  know  the  sampling  distribution  of  r^  for  each  case. 

For  the  hypothesis  =  0,  this  distribution  is  symmetric  and  can  be 
described  by  a  statistic  involving  Student’s  t-d;.stribution.  If  p^  t  0, 
the  sampling  distribution  of  r..  is  skewed.  Then  Fisher's  Z-transforma- 
tion  can  be  employed  to  transform  the  skewed  distribution  into  one  which  is 
approximately  normal.  Let  us  express  now  these  considerations  mathematical 
ly: 


52 


1.  Hypothesis  p.^  =  0: 


r.,.  /N 


t  =  -Si- 


Vl  -  r?, 


has  Student's  t-distribution  with  N  -  2  degrees  of  freedom.  The  hypoth¬ 
esis  is  accepted  at  a  predetermined  level  of  significance,  if,  for  r.,  , 
t  is  computed  to  be  less  than  the  t-value  read  from  Student's  t-distri¬ 
bution  table  at  the  given  level  of  significance  and  at  given  degrees  of 
freedom. 

2.  Hypothesis  i  0:  The  distribution  of  r^  is  transformed 

by  Fisher's  Z-transformation.  We  obtain: 


,  /  1  +  r.. 

2  =  ?  logo(rT-F^ 


with  mean 


1  +  P-JL 


1  "  pjk 


and  standard  deviation 


/rt  -  3 


Within  the  context  we  will  be  especially  interested  to  find  95b 

confidence  limits  for  r..  .  We  proceed  to  do  so  by  first  testing  the 

1 K 

hypothesis  that  for  a  given  correlation  coefficient  r^  trie  true 
population  coefficient  p  is  zero.  I  f  the  hypothesis  is  rejected 

JK 

we  are  able  to  compute  confidence  limits. 

We  have  learned  that  we  can  be  95s  confident  to  find  the  mean 
p..  in  the  interval  Z  ‘  1.96o_.  It  is 

La  & 


Z  *  1.96o„ 


1  . 

2  lQg„ 


1  * 
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and 


1  +  P. 


2>L  = 


1  "  Pjk 


from  Equation  5  we  obtain 


Pjl<  = 


CA(N)  -  1]  +  EA(N)  +  l]rjk 
[A(N)  +  1]  +  [A(N)  -  l]r  • 


from  Equation  6  we  obtain 


r  i 

^  r  i  .  ,i 

MnT  '  H 

+  bnn +  \ 

OTT*  l] 

+  [aTnT-1. 

rjk 

[1  -  AOOj  +  [It  A(N)]rjk 
[1  +  A(N ) 3  f  [1  -  A(N)]rjk 


(6) 


(5a) 


(6a) 


Equations  5a  and  6a  furnish  the  confidence  limits  for  the  correlation 
coefficient  rjjc>  whose  corresponding  population  coefficient  is  p^. 

One  is  referred  to  Spiegel  (Reference  20)  as  a  reference  for  this 
subsection. 


2.6  PEARSON'S  PRODUCT-MOMENT  CORRELATION  COEFFICIENT  DERIVED  FROM 

INCOMPLETE  DATA 

Let  us  assume  that  we  have  n  variables  and  N  individuals,  on 
which  observations  are  taken.  It  can  quite  often  happen  in  practice 
that,  for  some  reason,  observations  for  a  variable  can  be  taken  only 
for  some  of  the  N  individuals.  There  are  several  ways  to  compute  a 
Pearson  product -moment  correlation  coefficient  on  the  basis  of  a  differ¬ 
ent  number  of  observations  for  each  variable.  In  the  following,  three 
methods  are  described.  Each  time  the  basic  formula  (in  terms  of  raw 
scores): 
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I  ”  Z  Z  Xkl 

i=l  31  K1  i=l  31  i=l  K1 


rjk  = 


r  n 

|_2 

r  .j  n 

X?. 

31 

l  xjjL 

i=l  31 

.A 

L  * 

Z  xki 

i=i 

_  N 

»  M 

1) 

is  adjusted  for  the  situation  of  incomplete  data  by  particular  means. 

1.  Method:  SRL-Routine  for  the  Computation  of  r^. 

The  correlation  coefficient  is  computed  on  the  basis  of  simultaneously 

existing  data  points  for  the  two  variables  X.  and  X,  . 

3 


rjk 


Z  Vki  Z  Xji  *  l  Xki 

i=i*  i1  K1  i=i *  31  i=i*  K1 


N* 


N* 


ll  X?. 

/i-x*  31 

'I  x  ’ 

i=i*  31 

2 

Z  Xki 

i=i*  kl 

/  H* 

• 

V  p 

N* 

where  N*  is  the  number  of  data  points,  which  exist  for  X.  and  X, 

3  K 

simultaneously.  The  index  i*  picks  from  the  set  {l,...,rl}  those 
numbers,  which  are  accounted  for  in  W*. 

Rewriting  the  formula  above,  we  obtain 


Z  I  X  i  [  X  , 

i=i*  i=ift  31  i=i*  K1 

p  ~  •  p 


N* 


jk 


/  l  X?. 


f  I  Xjil 

i=i*  31 

2 

~\j 

I ilifii 

‘j./ki 

1=1* 

N* 

V 

N* 

We  form  the  means 
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X.*  = 
3 


N* 


and 


x; 


I  K- 

■  ‘•■a  ki 
1=1* 

P 


For  the  computation  of  the  means  only  those  observations  of  X^  and  X^ 
are  take 
neously. 


are  taken  into  account,  where  observations  exist  for  X.  and  X.  simulta- 

3  k 


Then  the  correlation  coefficient  formula  reduces  to 


..Lxiixki 


(7) 


Xki 

ii_l  _  X  «a 
N*  k 


A  computer  program  for  the  above  outlined  computations  can  also  be 
found  in  Reference  21. 


2. 


Method:  Computation  of  r..  by  Making  Use  of  all  Available  Data. 
- : _ 3-- _ ...  ... _ 


A  consequence  of  the  first  method  is  that,  in  computing  only  from 
simultaneously  existing  data  points  for  both  variables,  valuable  informa¬ 
tion  is  neglected,  especially  in  computing  the  means.  The  means  are  based 
on  smaller  data  sets  than  available  and  may  therefore  not  as  precisely 
describe  the  true  population  means  as  would  be  possible  by  use  of  all 
available  data.  Therefore  the  following  method  of  computation  of  r^, 
which  takes  into  account  all  available  data  for  the  computation  of  means 
and  standard  deviations,  is  suggested. 

Let 
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-I 

i 


ki 

N2 


where  N1  accounts  for  the  data  points  existing  for  X^,  N2  for  the 
data  points  of  X^.  In  both  cases  summation  is  done  over  the  set  of 
existing  data  points  of  the  variables  under  consideration. 


rjk 


z 


iiZki 


The  correlation  coefficient  is  a  dot  product  of  two  standardized  variables 
divided  by  the  number  of  points  taken  into  consideration.  Since  the  prod¬ 
uct  between  two  points  has  only  meaning  if  neither  of  the  points  is  missing, 
the  summation  will  be  done  over  the  number  of  simultaneously  existing  points 
(i=i*)  and  the  sum  will  be  divided  by  N*,  accounting  for  this  number  of 
points. 


I  .  <Xji  ‘  *j>(xki  "  V 


rjk  = 


i=i* 


N<Vk 


rjk  = 


(Xji  -  Xj)(Xki  -  Xk> 


N* 


<Xkl  -  Xk>2 
N2 


j,.  V«-x)  J*.  **1  -  j,.  V  *  ”*  V* 


N* 


7^-*i  v? 


X? 


ki 


-  X* 


N2  k 


Let 


L  ^ '  v 


and 
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as  in  the  case  of  the  first  method  and  obtain 

rjk 


If  N1  =  N2,  then  N*  =  N1  and  N*  =  N2 ,  and  X,  =  X.*  and  X.  =  X.*. 

k  k  3  3 

In  this  case  Equations  7  and  8  are  identical. 

3,  Method;  Substitution  for  Missing  Data  Points 

Another  means  one  can  think  of  as  a  solution  to  the  problem  of  com¬ 
puting  r_.^  from  incomplete  data  is  the  means  of  inserting  some  value 
for  the  missing  data  points  of  the  variables.  The  values,  which  suggest 
themselves  for  substitution,  are  the  statistical  means  of  the  variables. 
Since  all  sums  are  then  taken  over  H  variables  the  formula 


“>9 


can  be  reduced  to 


(defined  as  in  the  2.  method)  are  substituted  for  missing  data  points  in 
variables  X^ ,  Xk  respectively. 

The  advantage  of  this  last  method  is,  that  in  the  final  computation 
of  the  correlation  coefficient  only  one  sample  size  N  is  used.  This  can 
be  of  particular  importance,  if  the  correlation  coefficient  i3  later  on 
used  for  statistical  considerations,  which  are  based  on  sample  size  N. 

Remarks : 

A.  One  general  remark  can  be  made  concerning  the  three  discussed 
methods:  If  the  total  number  of  observations  is  large,  some  missing  data 
points  will  not  affect  the  correlation  coefficient,  computed  by  the  three 
methods,  very  much.  This  is  based  on  the  fact,  that  the  mean,  with  large 
sample  size  N,  gets  nearly  stable. 

B.  It  is  important  to  know,  what  to  do  when  data  points  are  missing. 
An  example  can  be  given  reflecting  this  importance.  A  correlation  matrix 
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was  computed  by  using  Equation  7.  It  happened  that  for  the  computation 
of  one  r^  there  were  only  a  very  fev  simultaneous  observations  on  vari¬ 
ables  a  and  b,  while  for  all  other  computations  the  number  of  simulta- 
nious  observations  was  much  larger  and  almost  equal.  This  fact  was  not 
observed  when  the  correlation  matrix  was  established.  At  another  step  of 
computation  later  on,  however,  it  was  exhibited  how  influential  the  dif¬ 
ferent  numbers  of  observations  were:  The  correlation  matrix  R  was  not 
Gramian  (symmetric  and  all  principal  minors  greater  than  or  equal  to  zero) 
any  more,  what  it  should  have  been  according  to  the  way  it  was  derived  as 
R  =  ZZ  /N.  In  using  Equation  7— as  well  as  Equation  8 — the  N  is  differ¬ 
ent  ,  however,  for  each  element  of  R.  Only  by  using  Equation  9  one  com¬ 
putes  all  elements  on  the  basis  of  the  same  sample  size  N.  This  is  an 
advantage  with  respect  to  preserving  Gramian  properties.  On  the  other 
hand,  substitution  of  means  for  missing  data  points  may  disturb  the  true 
relations  of  the  variables  too  much,  so  that  a  later  factor  analysis  of 
the  correlation  matrix  may  not  reflect  the  true  intercorrelation3  among 
the  variables  any  more.  This  suggests  that  the  product — moment  correlation 
coefficient  should  be  computed  by  either  Equation  7  or  Equation  8. 

Example:  As  an  example  for  the  three  considered  methods ,  130  pairs  of 
adult  male  height  and  weight  measurements  were  selected.  Using  Pearson's 
product— moment  correlation  formula  the  correlation  coefficient  of  the  two 
variables,  on  the  basis  of  130  pairs  of  measurements,  is  computed  to 
be  0.484. 

To  exhibit  the  three  formulae  for  different  degrees  of  missing  data, 
three  random  samples  were  drawn  from  the  sample  of  130  measurements. 

a.  A  random  subset  was  drawn,  such  that  75%  of  all  available  data 
were  used,  50%  in  complete  pairs  (height,  weight),  so  that  50%  of  complete 
data  pairs  were  missing. 

b.  Next,  a  random  subset  was  drawn,  such  that  85%  of  all  available 
data  were  used,  70%  in  complete  pairs,  so  that  30%  of  complete  data  pairs 
were  missing. 

c.  In  the  same  manner,  a  random  sample  was  drawn,  such  that  95% 
of  all  available  data  wore  used,  90%  in  complete  pairs,  so  that  10%  of 
complete  data  pairs  were  missing. 
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The  results  of  the  computation  are  listed  below. 


Missing  data  pairs 


Computation  by 

50% 

30% 

10% 

Equation  7 

0.544 

0.508  • 

0.492 

Equation  8 

0.557 

0.537 

O.SOO 

Equation  9 

0.383 

0.437 

0.473 

It  is  seen  that  the  values  computed  by  Equations  7  and  8  converge  from 
above  and  the  values  computed  by  Equation  9  from  below  to  the  "true” 
value  0.484. 


2.7  MULTIVARIATE  CORRELATION — PARTIAL  AND  MULTIPLE  CORRELATION 

COEFFICIENTS 

Sinoe  in  later  sections  we  shall  use  the  multiple  correlation  coef¬ 
ficient  we  will  briefly  consider  it  and  also  the  partial  correlation 
coefficient,  for  the  sake  of  completeness ,' in  this  subsection. 

To  help  clarify  the  nature  of  both  coefficients  let  us  consider 
the  following  problem.  Assume  that  the  variables  (scores  on  them  are 
given)  stature,  intelligence,  and  quickness  of  decision  contribute  to 
leadership.  He  term  the  factor  leadership  the  dependent  variable  and 
the  other  three  factors  independent  variables.  Then,  if  we  determine  the 
correlation  of  the  dependent  variable  with  one  of  the  independent  vari¬ 
ables,  while  the  influence  of  the  other  independent  variables  is  held 
constant,  we  determine  what  is  named  the  coefficient  of  partial  correla¬ 
tion  between  the  two  variables  under  consideration. 

Mathematically  we  can  express  the  above  problem  in  the  form  of  a 
regression  equation.  Let  x^,  x^,  x^,  in  deviate  form,  represent  the 
independent  variables  and  let  xQ  represent  the  dependent  variable ,  which 
is  estimated  from  the  independent  variables.  The  equation 


x„  =  h,  „„x,  +  b„„  ,  x_  +  b  „  , _x„ 
0  01.23  1  02.13  2  03.12  3 


62 


is  called  a  regression  equation  of  xQ  on  ,  x2>  x3>  the  b’s  being 
constants.  The  graph  xjj  versus  x^,  for  example,  is  a  straight  line 
with  slope  bQ3  12*  In  this  coefficient  the  indices  left  of  the  dot 
show  the  two  related  variables ,  while  the  indices  right  of  the  dot  show 
the  variables  held  constant.  With  respect  to  the  fact  that  varies 
due  partially  to  the  variation  in  x^  and  due  partially  to  the  variation 
in  x2  and  x3  the  b-coefficients  are  called  partial  regression  coef¬ 
ficients. 

Generalizing  the  above,  a  regression  equation  of  xQ  on  Xj,  X2,...,X| 
can  be  written  as 

X0  =  b0l.23...kXl  +  b02.134.,.kX2  +  '**  +  b0k . 1 2 . . . (k-i  )*k 


The  partial  regression  coefficients  can,  if  necessary,  be  computed 
by  the  Doolittle  Method.  Then  the  partial  correlation  coefficients  are 
easily  developed  from  the  notion  of  partial  regression  coefficients. 

When,  in  2.2,  we  developed  Pearson's  product-- moment  correlation  coeffi¬ 
cient,  we  learned  that  the  correlation  coefficient  r^  is  given  by  the 
slope  corrected  for  the  different  measures  of  variability  of  x^  and 
x^:  r^k  =  «  Hence,  a  partial  regression  coefficient  is  the  slope 

of  the  line  relating  the  paired  measures  of  a  dependent  variable  and  one 
independent  variable,  when  the  influence  of  the  other  independent  variables 
has  been  excluded  from  consideration  but  when  the  units  of  measurement  are 
not  necessarily  of  equal  variability.  Corresponding  to  the  development 


of  Pearson's  product — moment  correlation  coefficient  let  us  now  develop 

the  partial  correlation  coefficient.  Let  b0i.l23...)i(..  k  denote  the 

partial  regression  coefficient  between  the  dependent  variable  xQ  and 

the  independent  variable  x^,  while  the  independent  variables 

x  ,  x  ,...,  x,  ,  excluding  x. ,  are  held  constant.  Let 

o  ,  and  o,  . ,  stand  for  the  standard  deviation  of 

0,l23...k  i.l2...)l(...k 

variables  x  and  x.  when  the  effects  of  variables  x  ,  x  ,...,  x,  , 

0  1  1  2  k 

and  x1,  x2,...)xi(...,xk  have  been  ruled  out.  Let  *oi.,2...)i(...it 
denote  the  partial  correlation  coefficient.  Then 
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and 


r0i.l2..'.)i(...k  "  b0i.l2...)i(...k 


i.  ,k 

J0.l2...k 


r  »  Vj 

10 • 1 2* • • )i( » • ik  10.12..* )i( • • .k 


Since  r„ 


l0i. 12. . . )i( 
correlation  coefficient  as: 


0.12. ..k 


k  "  ri0 . 12.  .■  ,)i(...k 


ai.l2...)i(...k 
we  define  the  partial 


0l.l2...)l(...k 


£  "  °i.l2...)i(...k  _ 

"V  0i'12*,,)i(,,,k  a0.12...k  Ui<,-12,”)i(”,k  °i.l2...)i(...k 


b.. 


0.12. ..k 


=  /b 


0i.l2...)i(...k  i0.12...)i(...k 


The  sxgn  is  the  sign  computed  for  b0£  12> )i( < >>k  or  b i0,12...)i(...k. 
both  being  the  same. 

Referring  again  to  our  example  in  the  beginning  of  this  subsection, 

another  question,  we  might  be  interested  in,  could  be:  What  is  the 

correlation  between  leadership  and  the  three  independent  variables  taken 

jointly?  The  coefficient  which  describes  the  relationship  between  the 

dependent  variable  and  the  independent  variables,  taken  together,  is 

called  the  multiple  correlation  coefficient. 

For  a  certain  individual  we  actually  get  a  score  on  the  dependent 

variable,  call  it  Z  ^  (i=l,...,N)  in  standard  form.  On  the  other  hand 

by  the  regression  equation  we  estimate  sucn  a  score.  So  the  multiple 

correlation  coefficient,  denoted  by  R  .  ,  is  defined  as  the  correla- 

0  •  1  2  * • 

tion  between  the  observed  ZQ  and  the  computed  Z*Q, 


fi  2^ .  zn * 

.  |  0i  0i 


0.12...k  ~  Ho  Or? 

1-1  Z0  *'0 


’0  '  b01.23...k‘;I  +  **’  +  80k. n... k-l^*  S°me  cornPutation  done  on 


■ith  Z„  s  B 
the  aoove  equations  yields 

K 


0.12...k  V  a01.23...kr01  +  ***  +  60k.l23...(k-l)r0k*  r01,...,r0k 

'r..',  Pearson's  product — moment  correlation  coefficients. 
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Section  III 


THE  CORRELATION  MATRIX 

3.1  INTRODUCTION 

Since  almost  every  factor  analytic  technique  begins  with  a  correlation 
matrix,  properties  of  correlation  matrices  and  techniques  concerned  only  with 
correlation  matrices  are  presented  in  this  section. 

Those  theorems  and  definitions  from  eigenvalue  theory  which  are 
particularly  applicable  to  correlation  matrices  and  which  will  be  needed 
in  factor  analyses  are  presented  in  subsection  3.2.  Subsection  3.3  contains 
a  definition  of  a  correlation  matrix  along  with  those  properties  which  are 
important  to  factor  analysis.  Subsection  3.4  concludes  the  section  with  a 
presentation  of  scaling  techniques  based  on  sample  size. 

3.2  EIGENVALUE-EIGENVECTOR  THEORY 

In  this  subsection  we  will  consider  those  definitions  and  theorems 
from  the  eigenvalue-problem  theory  which  are  necessary  for  and  used  in 
the  development  of  factor  analysis. 

Let  us  first  state  the  eigenvalue  problem. 

Consider  the  following  algebraic  problem:  Given  a  matrix  A  of  order 
n.  Determine  a  scalar  A  and  an  n-dimensional  nonzero  vector  x,  such  that 

Ax  =  Ax 

(A,  x  can  be  over  a  complex  field,  A  a  complex  number). 

Definition  3.1:  The  above  problem  is  defined  as  the  eigenvalue 
problem.  The  eigenvalue  problem  can  be  rewritten  as 

(A  -  AI  )x  =  0. 

n 

This  system  of  n  homogeneous  linear  equations  in  n  unknowns  has 
nontrivial  solutions,  if  and  only  if  the  determinant  of  the  matrix  of 
ror.f ficients  vanishes,  i.e.. 
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d(A  -  XI  )  = 

n 

Expanding  the  determinant  we  obtain  a  polynomial  in  X  of  degree  n, 
denoted  by  i}>(X),  so  that  the  requirement  is  <ti(X)  =  0. 

Definition  3.2:  The  equation  <J>(X)  =  0  is  called  the  characteristic 
equation.  The  n  roots  of  <}>( X }  are  named  the  eigenvalues  of  the  matrix 
A.  Associated  with  each  such  eigenvalue  X^  is  a  vector  x^,  named  an 
eigenvector  of  A. 

Completing  the  statement  of  the  problem  we  have  the  following  theorem: 


ail'A 


ni 


in 


a  -X 
nn 


=  0. 


Theorem  3,1:  The  equation  Ax  =  Xx  has  nontrivial  solutions  x  iff  X 
is  an  eigenvalue  of  A. 

Example 

Let  A  be: 

The  eigenvalue  problem  is 


with  solutions  Xj  =  3  and  X2  =  -1,  the  eigenvalues  of  the  problem. 

With  each  of  the  two  eigenvalues  is  associated  an  eigenvector.  For 

A,  =  3  and  X  =  -1  the  system  (A  -  XX  )x  =  0  each  time  reduces  to  a  single 
1  2  n 

equation: 
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3 


gives  Xj  -  x^  =  0 


gives  Xj  +  x2  =  0. 

The  complete  solution  set  is  then  described  by 


Let  us  further  restrict  ourselves  to  real  symmetric  matrics  A,  since 

the  matr ic U3  we  deal  with  in  factor  analysis  (the  correlation  matrices)  are  of 

this  kind. 

Let  us  consider  the  characteristic  equation  4>(A),  stating  the 
following: 

Theorem  3.2:  The  coefficient  of  Xr(r  S  n)  in  $(X)  is  (-l)r 
times  the  sum  of  the  principal  minors  of  order  n-r  of  A.  In  particular, 
the  coefficient  of  Xn  is  <  — 1 )n ,  the  constant  tern  is  dot  A. 
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In  the  special  case,  where  r  =  n-1,  the  coefficient  of  Xn  ^  is 


,  ,  \*»-l 

(-1)  ta 


11 


+  a 


22 


a  ) 
nn 


=  (-1)""1  l 


i=l 


a. . 
li 


where 

n 

l  a-, 

i=l  11 

is  called  the  trace  of  A. 

It  will  be  useful  to  know  the  following  two  theorems  about  the  roots 
of  the  characteristic  equation: 

Theorem  3.3:  If  X  is  a  simple  root  of  <f(X)  =  0,  then  the  rank  of 
(A  -  Xl)  is  n-1. 

Theorem  3.4:  If  X  is  an  r-fold  root  of  <fi(X)  =  0,  then  the  rank  of 
(A  -  Tl)  is  n-r.  [A  root  X  is  called  r-fold,  if  (X-X)  is  contained  in 

r  times.  A  root  which  is  not  an  r-fold  root  is  called  a  simple  root.] 


Let  us  next  consider  some  results  about  the  eigenvalues  and  eigenvectors. 

Theorem  3.5:  The  eigenvalues  of  a  real  symmetric  matrix  are  all  real. 

Theorem  3.6:  Eigenvector  associated  with  the  eigenvalues  of  a  real 
symmetric  matrix  have  all  real  components. 

Theorem  3.7:  Eigenvectors  associated  with  distinct  eigenvalues  of  a 
real  symmetric  matrix  A  are  orthogonal. 

Let  us  now  put  one  more  restriction  on  the  matrix  A,  namely  the 
restriction  that  all  its  elements  shall  be  greater  than  zero. 

Theorem  3.8:  Lc-t  all  elements  of  the  real  symmetric  matrix  A  be 
positive.  Then  A  has  always  an  eigenvalue  X,. which  is  real  and  positive, 
which  is  a  simple  root  of  the  characteristic  equation  and  which  is  hot 
exceeded  in  modulus  by  any  other  eigenvalue.  The  eigenvector  corresponding 
to  X  has  positive  components  and  is  essentially  unique  (up  to  scale  factors). 
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(The  theorem  is  due  to  Perron  (for  the  proof  see  Reference  22).  It  can  be 
extended  to  so-called  irreducible  matrices,  which  case  will  be  considered, 
though,  in  the  context.) 

3.3  DEFINITIONS  AND  PROPERTIES 

We  begin  this  subsection  with 

Definition  3.3:  A  correlation  matrix  R  is  a  square  matrix  where  each 

element  r. .  is  the  correlation  between  the  variables  Z ■  and  Z,  . 

1]  3  K 

In  the  sequel,  it  is  assumed  that  Pearson’s  product -moment  correlation 
is  used. 

The  most  important  properties  of  a  correlation  matrix  from  the' point 
of  view  of  factor  analysis  are  included  in  the  statement  that  a  correlation 
matrix  is  Gramian.  A  Gramian  matrix  may  be  defined  by 

Definition  3.4:  Let  R  be  symmetric.  Then  R  is  called  Gramian, 
if  it  satisfies  any  one  of  the  following  equivalent  conditions: 

1.  .  R  is  positive  semi-definite. 

2.  R  has  all  non-negative  eigenvalues. 

T 

3.  R  can  be  represented  by  the  matrix  product  AA  . 

4.  R  has  non-negative  principal  minors. 

5.  The  inner  product,  (RX|X)  >  0,  for  ail  X. 

Obviously  the  correlation  matrix  R  is  Gramian  since  it  is  obtained 
by  the  product  of  score  matrics. 


Where  N  is  the  number  of  observations. 

3.4  SCALING  TECHNIQUES  BASED  ON  SAMPLE  SIZE 

Two  correlation  mar  rue..';  with  identic  :1  elements  .ill,  of  course, 
yield  identical  factor  analyses.  If  it  were  the  case  that  identical 
elements  had  different  significance  levels,  these  differences  in 
reli-ahi  Lity  would  not  appear  in  the  factors.  Thus  in  order  for  a 
factor  analysis  to  lelleot  the  significance  of  the  correlation  coefficients, 
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Che  correlation  coefficients  should  be  scaled  in  accordance  with  their 
significance. 

As  used  in  Section  2.5,  the  variable 
X  1  +  r 

Zp  =  y  In  ,  Fisher's  Z-transformation,  ( 


is  approximately  normally  distributed  with 


/N  -  3  (2 

an  unbiased  estimate  of  the  standard  deviation  where  N  is  the  size  .of 
the  random  sample  used  in  computing  the  correlation  coefficient,  r. 
Then,  for  95%  of  the  samples,  the  variable 


Z  =Z  -iSL. 

p  r  -  ,/rr-T 


will  be  less  than  the  true  population  variable.  Thus,  Equation  3  may  be 
used  to  obtain  a  scaled  correlation  coefficient  p  given  the  observed 
correlation  coefficient  r.  p  will  have  on  the  average  one  chance  out  of 
twenty  of  exceeding  the  true  population  correlation  coefficient.  The 
probability  may  be  adjusted  by  changing  the  numerator  of  the  second  term 
in  Equation  3. 

Equation  1  may  be  used  to  solve  for  p  as  a  function  of  r  by 
substitution  into  Equation  3.  We  obtain  (as  derived  in  Section  2.5): 

_  -  Cl-A(N)]  +  [l+A(H)]r  f  . 

P  "  f 1+A(N) J  +  [l-A(N)]r  1  1 

Equation  4  is  the  formula  to  be  used  to  scale  observed  correlation 
coefficients,  r. 

Since  each  element  of  a  correlation  matrix  R  =  (r^)  is  a  correlation 
coefficient.  Equation  4  may  be  written  as 

Cl-A(H)]  +  [l+A(H)]r.. 

Pij  r  [1+A(N)]  +[l-A(H)]r.. 


for  ;:plication  to  correlation  matrices  with  observed  correlations,  rij' 
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Figure  1  shows  curves  relating  r  and  p  for  various  sample  sizes. 
A  chart  to  be  used  in  the  same  manner  as  the  chart  presented  here  appears 
in  Ezekial  and  Fox,  Methods  of  Correlation  and  Regression  Analysis 
(Reference  23,  p.  294).  However  the  shape  of  the  curves  in  the  chart 
differs  from  those  presented  here,  and  the  derivation  of  the  chart  is 
not  given. 
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Scaled  Correlation 


0.2  0.4  0.6  08  1.0 

Observed  Correlation,  r 


Figure  1.  Correlation  Coefficient  Scaling  Chart 
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Section  IV 


TECHNIQUES  OF  FACTOR  ANALYSIS 


4.1  INTRODUCTION 

After  the  correlation  matrix,  which  furnishes  the  basic  material 
for  a  factor  analysis,  has  been  investigated,  one  can  proceed  to 
consider  techniques  of  factor  analysis.  This  is  done  by  briefly 
reviewing  the  model  in  4.2  and  by  then  discussing,  in  4.3,  the  prop¬ 
erties  of  the  two  most  important  and  popular  factor  analysis  methods, 
the  centroid  and  the  principal-factors  methods.  Starting  with  section 
4.4  specific  problems  of  factoring  a  correlation  matrix  are  discussed; 
4.4  presents  a  new  technique  to  estimate  communalities ;  4.5  compares 
most  of  the  important  completeness  criteria;  4.6  called  "Eigenvalues 
and  Their  Bounds"  suggests  a  way  to  an  answer  on  the  important  question 
of  the  right  sample  size.  The  section  end3  with  a  brief  discussion  of 
factor  scores  in  4,7. 

4.2  REVIEW  OF  THE  MODEL 

In  this  subsection  the  model  will  be  presented  in  greater  detail 
stating  basic  definitions  and  equations. 

We  begin  with  the  two  basic  theorems  of  factor  analysis: 

Theorem  4,1:  For  every  correlation  matrix  R  there  exists  a 
corresponding  factor  matrix  F  such  that 

T 

FF1  =  R. 

Furthermore , 

Theorem  4.2:  There  exists  an  infinite  number  of  factor  matrices 
F  which  reproduce  any  given  correlation  matrix  R. 

The  problem,  then,  is  not  only  to  find  an  F,  but  to  find  the  F 
that  satisfies  a  given  set  of  initial  conditions  which  are,  more  often 
than  not,  subjective  decisions  and  boundary  criteria.  The  solution  of 
the  factor  analysis  problem  consists  of  two  basic  steps: 
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1.  Factoring  problem  -  factor  a  given  R  into  a  factor  matrix 
with  an  arbitrary  reference  frame. 

2.  Rotational  problem  -  rotate  the  arbitrary  reference  frame 
into  a  "preferred"  or  "simplifying"  position. 

Factorial  methods  were  developed  primarily  for  the  purpose  of 
investigating  and  identifying  the  principal  dimension  or  categories  of 
mentality  and  thus  are  plagued  by  the  non-mathomatical  justifications 
which  are  used  to  evaluate  them.  A  technique  infallible  to  a  psychologist 
can  be  worthless  to  the  engineer  grading  castings  or  a  company  rating  its 
employees.  Consequently,  some  of  the  basic  definitions  and  techniques 
are  given  next  using  mathematical  notation  while  comments  on  reliability 
and  practicality  for  application  result  from  longhand  factor  interpretation. 
From  References  2  and  24  come  necessary  basic  definitions  and  equations. 

It  is  the  purpose  of  factor  analysis  to  represent  a  variable  X^  in  terms 
of  several  underlying  factors,  or,  as  Harman  (Reference  2)  states, 
"hypothetical  constructs".  There  are  various  kinds  of  factors: 

.Common  factors  -  involved  in  more  than  one  variable 

a.  General  factor  -  present  in  all  variables 

b.  Group  factor  -  present  in  more  than  one  but  not  in  all  variables 
Unique  factors  -  involved  in  a  single  variable. 

Wo  now  use  the  notation  F^,  Fg,...,  F^  for,  say,  m  common  factors  and 

^1>  ^2 j • • • >  for,  say,  n  unique  factors  to  express  linearly  any  variable 
in  terms  of  the  factors  as  follows: 


X.  •  ajlF1  t  aj2F2  t 


+  a.  F  + 
jm  m 


aiV 


For  a  particular  individual  or  observation  we  have 


XJi  *  *jlFU  *  aj2r2i  * 


+  a.  F  .  +  a.U..  .  ,  „ 

3m  mx  3  31,  1  =  1,...,  N. 

n  ,  p=l,...,  m)  are  the  elements  of  the 


The  coefficients  a.  (j  =  1,.. 

3P 

factor  matrix  and  are  referred  to  as  the  factor  loadings  composing  the 
factor  matrix 


F  =  [a.  ] 
IP 
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The  total  variance  of  Xj  can  be  divided  into  two  parts ,  namely , 
that  part  which  it  shares  with  other  variables  and  that  part  which  is 
unique.  For  example. 


I  fa.,2  If?.  +  ...  +  a?  £  F2.+  a2  £  U2 
H  |  ]1  1  li  ]m  L  mi  j  *•  31 


+  2ail3j2  £  FUF2.  +  •••  +  2a.,ma..  £  F^U., 


3  bn  j 


mi  31 


where  all  summation  limits  are  i  =  1,  2,  ....  N 

If  the  variables  are  in  standard  form  and  the  factors  are  uncorrelated. 


2  2  2  2 

l=o.=  a.,  +  •••  +  a.  +  a. 
3  3l  3m  3 


The  terms  on  the  right  represent  portions  of  the  variance  ascribable  to  the 
factors  (i.e.,  af^  is  the  contribution  of  F^  to  the  unit  variance  of  Z^). 
The  total  contribution  of  a  factor  F  to  the  variances  of  all  variables  is 

- -  p 

defined  to  be 


V  =  £  a? 

P  3P 


Uniqueness  can  be  further  broken  down  into  specific,  ,  and  error , 
Ej ,  factors.  Since  error  and  specific  factors  are  uncorrelated. 


2  .22 
a.  =  b.  +  c. 
3  3  3 


where  b..  and  c^  are  the  respective  factor  loadings  of  S..  and  E.. . 
the  total  variance  can  be  expressed 


Therefore 


2  2  2  2  2 

1  =  hf  +  af  =  hf  +  bf  +  cl. 

3  3  3  3  3 

Communalities ,  Then,  are  defined  as  the  common-factor  variances  of  the 
variables . 
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A  set  of  equations  giving  any  set  of  varibles  { Z^j  in  terns  of  the 
m  common  factors  and  one  unique  factor  is  sometimes  called  a  factor  pattern. 
Such  a  pattern  can  be  presented  in  tabular  form,  e.g. 


a 


n 


The  number  of  common  factors  included  in  such  a  description  of  a 
variable  is  the  variable  complexity.  A  factor  matrix  which  represents 
the  total  unit  variance  of  each  variable  is  the  complete  factor  matrix. 

A  factor  matrix  which  represents  only  the  common  factor  variance  of  each 
variable  is  the  reduced  factor  matrix.  A  correlation  matrix  with  ones  in 
tho  diagonal  elements  is  referred  to  as  the  complete  correlation  matrix. 

A  row  of  the  factor  matrix  in  relation  to  an  origin  and  reference 
frame  in  (m  +  N)  -  space  (factor  space)  will  be  called  a  variable  vector. 
The  re-orientation  of  this  vector  within  the  space  constitutes  the  rotation 
problem.  Techniques  for  factor  rotation  are  discussed  in  Section  V. 

4.3  TYPES  OF  FACTOR  SOLUTIONS 

Ideally  a  factor  solution  displaying  a  minimum  complexity  (i.e„,  a 
common  factor  space  of  one  dimension  or  two  dimensions)  is  the  goal  of 
the  factor  analyst.  Such  a  factor  pattern  might  look  like 
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n  i 


a  a 

21  2 


a 

n  • 

The  uni-factor  and  two-factor  solutions  are  examples  of  such  theoretical 
entities.  However,  the  factor  analyst  rarely  will  see  data  which  can  be 
accounted-  for  so  simply.  Data  which  are  not  well  behaved,  requiring  a 
complex  network  of  correlated  and  uncorrelated  common  factors  as  well 
as  a  set  of  inconsistent  unique  factors,  is  the  rule  rather  than  the 
exception.  Consequently,  the  factor  analyst  generally  roust  first  decide 
what  he  is  looking  for  and  then  choose  a  technique  which  best  suits  his 
needs.  A  rather  short  list  of  factoring  methods  is  at  his  disposal. 

As  a  matter  of  fact  there  are  but  two  popular  methods  practiced,  differen¬ 
tiated  significantly  by  the  number  of  calculations  involved.  Thus  the 
centroid  method  for  years  has  set  the  standard  in  hand  computation  tech¬ 
niques  while  the  principal  axes  method  has  proved  itself  workable  using 
high  speed  digital  computers.  Both  methods  can  lead  to  multiple-factor 
solutions.  A  short  discussion  on  each  of  these  methods  follows. 

Centroid  Method  -  The  centroid  method  of  factoring  exuibits  what 
Thurstone  calls  a  "computational  compromise"  since  the  resulting  factor 
loadings  are  not  unique  for  a  given  R.  Let  us  assume ,  then ,  that  the 
original  score  matrix  S  consists  of  n  vectors  contained  in  m-spaco  where 
m  is  the  number  of  common  factors.  As  is  well  known,  the  correlations 
between  any  two  of  the  n  variables  are  just  the  scalar  products  between 
them.  To  obtain  a  vector  whose  m  components  give  the  centroid  of  the 
points  describing  the  set  of  common  factors ^  we  simply  average  the  ele¬ 
ments  in  the  factor  matrix  approximately,  or 


a 

nl 
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IthA  l\ . ;K| 


where  k  =1,2,  ...  n.  We  require  the  frame  of  reference  to  have  an 
axis  passing  through  the  centroid  thus  reducing  the  centroid  vector  to 


|rW 


With  a  minimum  of  transposition  utilizing  the  new  axis,  a  general 
formula  ran  easily  be  derived  which  gives  the  elements  of  the  first  factor 
loadings: 


n  n  n 

s-i  =  l  rik  a"d  T  =  l  l  r.. 
3  k-1  j=l  k=l  Jk 

The  residual  matrix  is  then  calculated 


[r  jk1  =  trjk" 


and  the  next  factor  loadings  are  calculated  using 


32 


e.S. 
/  T 


,  j  -  1,2 ,...  ,n  , 


where  =  *1,  depending  on  necessary  matrix  reflections. 

komoval  of  the  remaining  factors  follows  the  same  pattern  until  the 
process  is  ended.  Interestingly  enough,  no  dependable  tecnniquc  exists  to 
stop  this  sequence.  However,  this  problem  will  be  considered  in  subsection 
'4.5 

Principal  Axes  Method  -  The  principal  axes  method  of  factoring  derives 
on  ellipsoid  representation  where  the  axes  of  the  ellipsoids  correspond 
to  the  factors.  The  selection  of  the  factors  occurs  such  that  tnuir  respec¬ 
tive  contributions  to  the  communality  decreases.  In  other  words,  the 
contribution  of  factor  one  to  the  total  communality  is  maximum. 


78 


Therefore,  V  =  a2j  +  a|j  +  ...  +  a2jis  chosen  as  maximum  under  the 
conditions 


rjk  "  pSi^jP^P  *  3»^  =  1,2,...,  n. 


Applying  differential  calculus  to  these  conditions ,  the  characteristic 
equation  of  the  correlation  matrix  R  is  derived 


Solutions  of  the  characteristic  equation  are,  of  course,  the  eigen¬ 
values  of  the  matrix  which  have  the  following  well  known  property, 
generally  expressed: 


J 


The  set  of  eigenvectors  {a..^ }  corresponding  to  X^  then  are  used  to  obtain 


the  factor  loadings  of  factor  p 


ajpV 

ip 


p 


,  j-1 , . • . ,  n. 


*?. 

2P 


+•  a‘ 


np 


Of  course,  eommunalities  must  be  estimated  in  this  process  and  can  affect 
the  solution  for  a  small  number  of  variables.  Since  a  decreasing  amount 
of  coramunality  is  extracted  with  each  factor,  an  e  can  be  chosen  such 
that  |H2  -  h2|fe  completes  the  factoring  where  FT2  is  the  derived 
approximated  coramunality. 
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4.4  COMMUNALITY 

In  most  standard  factor  analysis  packages  available  in  computation 
centers  all  over  the  country,  there  exist  at  least  three  alternatives 
in  the  estimation  of  communalities:  insertion  of  ones  in  the  diagonal 

of  R;  arbitary  choice  of  h?  as  the  largest  r, .  in  either  the 
th  th  ^  * 

i  row  or  the  j  column  (this  is  the  technique  employed  by  Thurstone 

(Reference  25)  in  deriving  human  factors);  or  by  using  squared  multiple 

correlations  (see  Section  2.7)  as  conmunalities .  Hannan  (Reference  2,  p.86) 

states  the  problem  as  follows: 

"Literally  dozens  of  methods  for  estimating  communalities 
have  been  proposed  but  none  of  them  has  been  shown  to  be 
superior  to  any  of  the  others  on  the  basis  of  closer  approxi¬ 
mation  to  the  "true"  values.  As  a  matter  of  fact  none  of 
the  methods  has  been  demonstrated  to  lead  to  minimal  rank 
of  the  correlation  matrix.  The  choice  among  the  various 
methods  of  approximation  is  generally  made  on  the  basis  of 
available  computational  facilities  and  the  disposition  of 
the  investigator  to  employ  that  method  which  intuitively 
seems  best  to  approach  the  concept  of  communality . " 

In  this  subsection  a  new  technique  to  obtain  communalities  is  pre¬ 
sented.  Let  us  first  introduce  in  more  detail  the  basic  requirements 
for  estimation  of  communalities. 

For  uncorrelated  factors  the  communality,  h*,  of  the  jth  variable 
is  given  by  the  sum  of  the  squares  of  the  common  factor  coefficients,  viz,, 

h?  -  a?  +  a?  +  •••  +  a?  . 

3  31  j  2  jm 


The  elaboration  of  this  statement  has  yielded  further  defining 
characteristics: 

1.  The  communality  may  be  defined  as  the  squared  multiple  corre¬ 
lation  of  the  given  observed  variable  on  the  common  factors . 

2.  The  squared  multiple  correlation  of  the  given  variable  on  the 
remaining  variables  must  be  the  lower  bound  to  the  communality 
(References  26  and  27). 

3.  The  communality  is  the  upper  limit  of  this  squared  multiple 
correlation  as  the  number  of  variables  approaches  infinity  (Reference  27). 

4.  Since  the  communality  is  a  variance,  its  upper  limit  is  one. 
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5.  Unique  communalities  can  bo  obtained  only  when  the  rank  of  the 
matrix  satisfies  the  following  condition  (References  28  and  25). 

_  2n  +  1  -  /8n  +  1 
m  -  2  • 

These  requirements  yield  properties  which  have  been  stated  as  follows 
(Reference  29): 

1.  The  obtained  communalities  must  be  within  the  following  boundaries: 

OS  R?  i.  h?  6  1  . 

3  3 

2.  The  factor  loading  matrix  should  reproduce  the  reduced  correlation 
matrix  exactly. 

3.  Minimum  rank  should  be  attained. 

4.  The  reduced  correlation  matrix  should  be  Gramian. 

Bhen  the  principal  factor  method  is  used,  properties  2  and  4  can  be  shown 
to  be  equivalent. 

Guttman  (Reference  30)  has  shown  that  diagonal  values  which  reduce 
rank  may  not  satisfy  other  requirements  for  communalities .  The  Heywood 
case  (Reference  31)  is  the  classic  example.  Moreover,  the  statement  often 
made  that  the  rank  of  any  symmetric  matrix  with  even  random  elements  can 
always  be  reduced  to  a  certain  degree  by  choosing  diagonal  values  has  been 
shown  to  be  false  (Reference  29).  The  proof  is  based  on  the  impossibility 
of  assuring  real  solutions  to  systems  of  nonlinear  equations  with  real 
coefficients.  From  intuitive  considerations  of  experiment  design,  it  is 
to  be  expected  that  the  number  of  factors  causing  variance  among  the 
variables  is  even  greater  than  the  number  of  variables .  Other  minor  factors 
cause  variance  in  the  measure  of  variables  intended  to  measure  major  factors. 

Thus  the  attempt  to  find  diagonal  values  which  reduce  rank  must  end  in 
only  some  sort  of  approximation.  But  rank  reduction  is  basic  to  a  par¬ 
simonious  explanation  of  the  variance  of  the  variables,  and  different 
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approaches  may  be  taken  to  the  problem  of  approximating  rank  reduction. 

For  example,  one  of  the  few  commonality  estimates  based  on  rationale,  the 
method  of  triads,  uses  average  diagonal  values  which  seek  to  force  deter¬ 
minants  of  submatrices  approximately  to  zero.  On  the  other  hand,  the 
refactoring  method  simply  postulates  the  number  of  factors.  Many  so- 
called  "estimates"  of  communality  do  not  even  consider  rank  reduction. 

Then  from  the  foregoing  statements  we  may  distill  a  refined  definition 
of  the  communality  problem: 

Find  diagonal  values  h^  such  that  0  $  R?  $  h^  <1, 

and  such  that  the  correlation  matrix  with  these  diagonal 
values  is  Gramian.  Moreover  with  these  diagonal  values,  a 
higher  percentage  of  common  factor  variance  is  explained 
with  fewer  factors  than  with  any  other  diagonal  values . 

A  method  for  computing  diagonal  values  which  attempts  to  satisfy  this 
definition  is  described  in  the  sequel. 

A .  The  Method 

If  a  symmetric  matrix  A  is  bordered  by  the  oolumn  0,  the  row  U*, 
and  the  scalar  a,  then  the  eigenvalues  X  of 


satisfy  the  equation, 

X 

where  X^  is  the  unit  eigenvector  corresponding  to  the  eigenvalue  X. 
of  the  n  x  n  matrix  A  (Reference  32,  P.  27). 

Since  the  rank  of  a  matrix  is  the  order  of  the  matrix  is  the  order 
of  the  matrix  minus  the  number  of  zero  eigenvalues,  to  reduce  the  rank 
we  must  have  zero  eigenvalues.  Then,  in  view  of  Equation  2,  a  necessary 
condition  for  zero  eigenvalues  is  that 

n  (U  I  X.  )2 

a  =  l - pi—  .  (3) 

i  =  l  i 


n  (0  |X.)2 

=  iii  x  -  Xi 


(2) 
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When 


X^  =  0,  it  can  be  shown  that  (U  |  X^)  =  0,  thus 


.  (U  I  X.)2  . 

lim  - 1  =  0 


X.-K) 

l 


X. 

l 


Thus  the  terms  in  Equation  3  corresponding  to  zero  eigenvalues  of  A 
may  be  elided. 

The  foregoing  scheme  will  be  used  to  find  each  diagonal  element. 
Before  presenting  the  algorithm  formally,  a  theorem  on  transforming 
eigenvalues  is  needed.  Two  diagonal  elements  of  R,  amd  Rnn> 
may  be  interchanged  by  the  transformation 


where  1^  is  the  identity  matrix  with  the  k  and  n^  rows  (or  columns) 
interchanged.  The  use  to  be  made  of  this  transformation  rests  on  the 
Theorem  4.3;  R  and  I^RX^  have  the  same  eigenvalues. 

Proof:  R  may  be  diagonalized  by  an  orthogonal  transformation  P  by 


R  =  P*AP 


where  A  is  the  diagonal  matrix  of  eigenvalues  of  R. 

Now  we  may  consider  R^  =  I^RI^ .  Using  the  facts  that  =  I  and 

l£  =  1^  we  have 

R,  =  I.RI.  =  I.  P*API. 
k  k  k  k  k 

=  (PIk)*A(Plk) 

=  PlAPi 

Thus  Rk  is  diagonalized  by  the  orthogonal  matrix  Pj  with  the  same 
diagonal  eigenvalue  matrix  A.  Moreover,  the  eigenvectors  are  also  per¬ 
muted  since 

RX  =  XX  , 

h“Wx  --  xlkx  - 


83 


and  (X£  |  Xi)  =  1  . 

Both  unities  and  squared  multiple  correlations  have  been  used  as 
initial  diagonal  values .  After  the  diagonal  values  have  been  found  for 
all  k,  the  process  is  repeated  until  all  diagonal  values  are  stable. 
Diagonal  values  are  used  (i.e.,  replace  old  values)  as  soon  as  they  are 
calculated.  In  practice,  convergence  is  enhanced  by  omitting  terms  in 
Equation  3  for  which  <  e  =  .05.  The  final  result  of  the  method  is 

a  clustering  of  eigenvalues  about  zero.  Thus  there  are  small  negative 
eigenvalues.  For  the  sake  of  interpretation  Gramian  properties  ace  not 
necessary.  However  when  data  reduction  is  the  object  of  the  factor 
analysis,  Gramian  properties  may  be  restored  by  adding  the  absolute 
value  of  the  negative  eigenvalue  with  the  largest  absolute  valde  to  each 
element  on  the  diagonal.  In  proof  we  write 

R  =  P*AP 

where  A  is  the  diagonal  matrix  of  eigenvalues  of  R. 
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Let  A  be  the  scalar  matrix  XX; 
n  n 

then  R  -  A  =  P*AP  -  A  . 

n  n 

But  since  P*P  =  I  and  commutes  with  A  , 

n 

R  -  A^  =  P*A?  -  P*AnP 
=  P*( A  -  An)P  . 

Thus  the  elements  of  the  diagonal  matrix  A  -  An  are  the  eigenvalues  of 

R  -  A  .  But  the  elements  of  A  -  A  are  all  positive  or  zero.  Therefore 

n  n 

R  -  An  is  Gramian.  However,  this  method  for  forcing  Gramian  properties 
may  lead  to  communalities  larger  than  1. 

To  better  understand  how  the  above  bordering  scheme  may  drive  the 
eigenvalues  of  R  to  zero  and  thus  reduce  the-  rank,  let  us  plot  on  the 
same  graph  each  side  of  Equation  2  as  a  function  of  X  (see  Figure  2). 

The  solid  graph  is  the  right  hand  side,  and  the  dotted  graph  is  the 
left  hand  side  of  Equation  2.  The  eigenvalues  A  of  the  bordered  matrix 

occur  at  the  intersections  of  the  sets  of  curves.  Notice  that  the 

eigenvalue  of  the  smaller  matrix  always  lies  between  two  eigenvalues 
of  the  larger  matrix.  Each  of  the  dotted  lines  corresponds  to  a 
different  choice  of  a.  The  uppermost  dotted  line  corresponds  to  the 
a  chosen  according  to  Equation  3,  in  which  case  we  have  an  eigenvalue 
of  zero.  The  observed  effect  of  reapplying  the  algorithm  after  trans¬ 
formation  of  the  matrix  (i.e.,  finding  a  new  diagonal  element)  is  to 
shift  positive  eigenvalues  to  the  left  (closer  to  zero)  and  negative 
eigenvalues  to  the  right  (closer  to  zero).  A  formal  deductive  proof 
of  convergence  has  not  yet  been  found;  however  the  success  of  the 
algorithm  in  solving  the  communality  problem  is  exhibited  in  the  following 
examples . 
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B.  Examples 

Example  1.  Six  Hypothetical  Variables:  Harman  (Reference  2,  p.  91) 
used  these  variables  to  illustrate  communality  estimates  by  various 
methods .  The  results  of  applying  the  method  described  in  this  paper 
are  shown  by  plotting  the  eigenvalues  of  the  6x6  correlation  matrix 
with  calculated  diagonal  values  (Figure  3).  The  eigenvalues  obtained 
using  unities  and  squared  multiple  correlation  are  also  plotted  in 
Figure  3.  In  this  example  and  in  every  other  application  of  the 
method  described  here,  the  following  inequalities  have  held: 

2  2 

Xi(R  )  <Xx(d  )  <  Xi(l)  i  =  1 . n 

R?  <  d?  <  i  i  a  1 . . 

where  d?  are  the  calculated  diagonal  elements  and  R?  are  squared 
multiple  correlations .  The  calculated  diagonal  elements  are  "true" 
communalities  in  the  sense  that  the  correlation  matrix  was  constructed 
to  attain  rank  two  with  these  values.  No  communality  estimate  presented 
by  Hannan  found  these  values . 

Example  2.  Thirteen  Psychological  Variables:  The  data  for  this  example 
was  also  taken  from  Hannan  (Reference  2,  p.  137).  However  these  variables 
are  experimental  rather  than  hypothetical.  The  plot  of  the  three  sets  of 
eigenvalues  are  shown  in  Figure  4.  These  variables  were  well  chosen  to 
illustrate  three  major  factors  as  clearly  seen  in  Figure  4.  However,  it 
would  only  be  accidentally  possible  to  find  diagonal  values  which  would 
yield  ten  zero  eigenvalues  (i.e.,  a  rank  three  correlation  matrix). 

Example  3.  16  Hypothetical  Variables :  A  16  x  16  matrix  was  constructed 

by  squaring  a  16  x  4  matrix  of  random  elements  with  normalized  columns. 
Thus  the  16  x  16  matrix  was  of  rank  q  when  the  constructed  diagonal 
elements  were  retained.  The  proposed  method  found  these  "true  communali¬ 
ties"  given  the  constructed  matrix  with  unities  on  the  diagonal.  The 
plots  of  eigenvalues  are  shown  in  Figure  5. 
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Figure  3.  Eigenvalues  of  6»6  Matrix 


C.  Conclusions 

The  proposed  method  found  diagonal  elements  which  satisfied  the 
refined  definition  of  communality  in  all  cases  where  an  exact  reduced 
rank  was  known  and  possible.  In  the  other  cases,  the  method  found  diagonal 
elements  which  satisfied  the  definition  better  (from  the  eigenvalue  point 
of  view)  than  did  either  squared  multiple  correlations  or  unities. 

The  method  converged  to  "true  communality"  when  either  squared  multi¬ 
ple  correlations  or  unities  were  placed  on  the  diagonal  initially.  However, 
the  process  converged  faster  with  squared  multiple  correlations  as  initial 
values. 

From  a  study  of  eigenvalue  plots  in  several  cases,  it  would  appear 
that  squared  multiple  correlation  is  a  very  good- estimate  of  communality 
when  there  are  only  a  few  well-defined  major  factors.  That  is,  either 
estimates  of  communalities  are  calculated  by  the  method  presented  here 
using  unities  or  squared  multiple  correlations  as  initial  values  for  the 
method,  or  squared  multiple  correlations  are  themselves  used  as  estimates 
of  communalities.  When  factor  analysis  is  used  for  the  purpose  of  inter¬ 
pretation,  the  factor  loadings  are  used  to  indicate  which  variables  to 
associate  with  which  factors.  And  the  sets  of  associations  are  the  same 
whether  the  factor  loadings  are  obtained  from  a  final  reduced  correlation 
matrix  with  communalities  on  the  diagonal  or  squared  multiple  correlations 
on  the  diagonal.  Thus  squared  multiple  correlations  are  sufficiently  close 
to  true  communalities  to  distinguish  major  factors  when  they  exist. 

4.5  COMPLETENESS  OF  FACTORIZATION 

In  factoring  a  correlation  matrix  no  unique  test  as  an  answer  to  the 
question  "when  to  stop  factoring?"  has  yet  been  developed.  At  present 
there  exist  several  methods  which  are  applied  with  more  or  less  success. 

A  few  comparative  or  survey  studies  of  some  of  these  methods  are 
available:  Mosier  (Reference  33)  studies  six  different  tests  for 
completeness  of  factorization,  applying  them  to  one  correlation  matrix. 
Cattell  (Reference  24)  lists  and  evaluates  eleven  tests.  Burt 
(Reference  34)  summarizes,  under  the  topic  of  "tests  of  significance 
in  factor  analysis"  many  of  the  existing  methods.  Fruchter 
(Reference  35)  comparatively  evaluates  various  tests,  applying  them 
to  one  or  more  concrete  cases.  In  the  most  recent  survey  paper  Sokal 
(Reference  36)  evaluates  comparatively  five  tests  applying  them  to 


Thurstone's  box  measurements,  an  artificial  correlation  matrix,  a 
psychological  matrix  and  a  biological  matrix,  using  centroid  factor 
extraction.  The  present  investigation  compiles  methods,  which  are  used 
by  factor  analysts,  in  the  form  of  a  quick  reference,  listed  in  a 
systematic  way. 

Before  leaving  this  introductory  part  let  us  make  two  remarks: 

(a)  Cattell  (Reference  24)  and  also  Burt  (Reference  34)  and 
Fruchter  (Reference  35)  suggest,  that  if  one  wants  to  rotate,  it 
pays  off  to  extract  one  or  two  more  factors  than  necessary  after 
application  of  any  of  the  completeness  tests,  since  one  obtains  more 
accurate  results.  Several  workers  also  suggest  applying  more  than  one 
criterion  and  deciding  upon  the  number  of  factors  on  the  basis  of  the 
results  of  all  the  criteria. 

(b)  Obviously  a  solution  to  the  communality  problem  together  with 
the  simultaneous  knowledge  of  the  rank  will  also  resolve  the  completeness 
problem.  The  technique  described  in  4.4  presents  such  a  solution.  Since 
it  is  a  converging  process  the  adequacy  of  the  factor  solution  of  the 
original  correlation  matrix  may  then  be  shown  by  any  of  the  following 
tests.  It  is  suggested  to  then  use  one  of  the  statistical  tests,  in  order 
not  to  bring  an  empirically  approximate  view  into  the  mathematically  sound 
picture  of  the  applied  method. 

For  reference  let  us  set  up  the  following  list  of  methods  to  test 
completeness  of  factorization: 

A.  Empirical  completeness  tests 

1.  Percentage  tests 

2.  Tucker's  test 

3.  Cattoll's  scree-test 

4.  Kaiser's  test 

B.  Significance  tests  for  completeness 

1.  Tests  for  joint  significance  of  residuals 

a.  McNemar's  test 

b.  Saunder's  test 

2.  Tests  for  individual  significance  of  residuals 

a.  Test  by  means  of  standard  error  formula  for 
the  final  residuals 

b.  Sokal's  test 
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3.  Burt's  chi-squared  test 

4.  Lawley's  chi -squared  test 

C.  Miscellaneous  tests  for  completeness 

1,  Index  of  completeness  of  factorization 

2,  Listing  of  other  completeness  tests 

A.  Empirical  Completeness  Tests 
1.  Percentage  Tests 

A  practical  and  commonly  used  test  for  completeness  of  factorization 
considers  percentages  of  total  communality,  accounted  for  by  the  factors. 
The  tests  can  be  conducted  under  different  aspects: 

(1)  Determine  in  advance  to  analyze  up  to,  say,  50%  of  the  total 
variance,  or  a  suitable  proportion  of  the  total  reliability  (leaving  a 
proportion  for  the  specificity). 

(2)  Determine  in  advance  that  a  factor  which  accounts  for  less 
than,  say,  5%  of  the  total  variance  will  not  have  any  practical  signifi¬ 
cance  in  the  sense  of  being  identifiable. 

(3)  Extract  factors  and  if,  after,  say,  90%  of  the  total  communality 
or  total  variance  have  been  accounted  for,  a  factor  accounts  for  only 

2%  of  these  totals,  do  not  retain  it  in  the  set  of  factors. 

The  percentage  tests  are  especially  handy  for  the  principal  factor 
solution  since  the  contribution  of  the  factors  to  the  total  variance  or 
total  coimiunality  decreases  with  each  succeedingly  extracted  factor. 

One  could  then  stop  factoring  after  one  reaches  a  factor  which  accounts 
for,  say,  5%  of  these  totals.  One  knows  that  the  next  factor  which  could 
be  extracted,  would  contribute  less  than  5%  to  the  totals. 

There  is  one  more  simplifying  aspect  of  the  principal  factor  solution. 
The  total  contribution 


n 

l 

j=l 


3P 


of  factor  Fp  to  the  total  variance  or  total  communality,  which  is  equal 
to  the  trace  of  the  determinant  of  the  correlation  matrix,  is  equal  to 

Xp-eigenvalue.  The  effect  of  each  factor  contribution  to  these  totals 
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can  therefore  be  computed  easily  as  eigenvalue -percentage  of  the  trace. 
2.  Tucker's  Test 

Denote  by  I  Ip^I  the  sum  of  absolute  residuals  of  the  n  *  n 
correlation  matrix  after  k  factors  have  been  extracted. 

Tucker's  test  (Reference  37); 

If 

~\A  kt  r  , 

then  the  (ktl)-factor  is  considered  to  be  insignificant. 

Tucker's  criterion  after  a  modification  by  Blakey  (Reference  38): 

If 


E  lpk+J  i 
l  IPj 


n  -  1  . 

n  +  1 


then  the  (k+l)-factor  is  considered  to  be  insignificant. 
Remarks : 


(a)  1  |p)J  an<^  1  |pk+1l  include  the  communality  residuals. 
Sokal  (Reference  36)  states,  that  it  is  desirable  to  use  re-estimated 
communalities  in  place  of  residual  ones  in  the  denominator;  but  since 
the  difference  between  residual  and  re-estimated  diagonal  values  is 
usually  slight,  it  is  not  of  great  importance  what  values  are  used  in 
the  main  diagonal. 

(b)  Cattell  (Reference  24)  considers  Tucker's  test  as  one  of 
the  most  reliable  and  practical  ones  of  the  really  quick  tests  of 
completeness,  though  it  sometimes  can  give  strange  results,  since 
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the  value  of  the  ratio  can  decrease  or  increase  after  extraction 
of  factors  instead  of  increasing  steadily. 

Sokal,  using  the  second  form  of  the  criterion  in  a  comparative 
study  of  five  tests  for  completeness  of  factorization,  considers  the 
test  as  not  very  suitable  as  a  strong  and  fast  criterion.  Empirical 
investigations  by  McNemar  (Reference  39)  and  theoretical  investig¬ 
ations  by  Burt  (Reference  34)  support  his  standpoint.  Burt  criticized 
Tucker's  test  as  making  no  allowance  for  the  number  of  variables  and 
the  number  of  factors  extracted  and  as  making  no  explicit  reference 
to  the  size  of  the  sample.  He  considers  the  test  as  marking  too 
many  factors  as  insignificant. 

Tucker's  criterion  has  actually  been  employed  by  more  factorists 
than  any  other  criterion. 

3.  Cattell's  Scree-test  (for  a  principal  factor  solution) 

Starting  with  the  largest,  each  eigenvalue  is  plotted  in  an  x-y- 

coordinate  system,  its  number  versus  its  length.  Then  the  curve 
through  these  points  is  examined.  If  the  number  of  factors,  m, 
is  less  than  the  number  of  variables,  n,  n-m  eigenvalues  of  the 
correlation  matrix  will  be  zero  or  at  least  close  to  zero,  lying 
on  a  straight  line  almost  parallel  to  the  x-axis.  The  test  consists 
in  determining  that  point,  where  the  curve  breaks  off  the  straight 
line.  The  number  of  eigenvalues  determining  the  left  part  of  the 
curve  yields  the  number  m  of  factors . 

4.  Kaiser's  Test  (Reference  40) 

Upon  extensive  studies  of  correlation  matrices  with  unities  in 
the  main  diagonal  Kaiser  suggests  as  a  practical  basis  for  determining 
the  number  of  common  factors  the  number  equal  to  the  number  of 
eigenvalues  greater  than  one .  Kaiser  found  that  this  number  amounts 
to  about  a  sixth  or  a  third  of  the  total  number  of  variables . 

B.  Significance  Tests  for  Completeness 
1.  Tests  for  Joint  Significance  of  Residuals 
a.  Hcl.'emar's  test 

Let  ok  denote  the  observed  standard  deviation  of  the 
residuals  (disregarding  diagonal  values)  after  extraction  of  k 
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factors.  Let  M^2  denote  the  mean  of  the  communalities  (computed 
from  k  factors).  Then  1  -  M^2  is  the  average  uniqueness.  If 


N  being  the  number  of  observat ions  all  significant  factors  have 
been  extracted. 

Remarks: 

(a)  McNemar's  criterion  is  an  attempt  to  test  the  significance 

of  the  residuals  after  k  factors  have  been  extracted  from  the  correla¬ 
tion  matrix.  He  used  the  centroid  solution  for  his  derivations.  In  the 
beginning  years  of  factor  analysis  an  attempt  to  do  so  was  made  by  com¬ 
paring  the  standard  deviation  of  the  residual  correlations  with  the 
standard  error  of  the  original  correlations.  This  device,  though.  Is 
not  adequate  since  residual  correlations  are  analogous  to  partial 
correlations  (the  factors  being  held  constant)  and  should  for  this 
purpose  be  divided  by  the  geometric  mean  of  the  uniquenesses  of  the 
two  variables  under  consideration.  To  reach  his  goal  to  test  the 
significance  of  the  residuals  after  k  factors  have  been  removed 
from  the  correlation  matrix,  McNemar  approximates  the  standard  devia¬ 
tion  of  the  residuals  or  partial  correlations  by 

°k 

1  -v 

Cattell  (Reference  24)  reasons  on  the  basis  of  experience  that 
McNemar’s  test  tends  to  stop  factorization  too  early.  Sokal  (Reference  36) 
concludes  from  his  studies  that  McNemar’s  test  yields  interpretable 
results  except  for  problems  with  very  large  sample  size  N  and  low 
uniquenesses  (that  is,  high  communal it ies ) ,  in  these  cases  indicating 
more  than  the  true  number  of  factors.  In  this  respect  it  is  worth 
noting  that  the  test  mainly  takes  into  account  the  sample  size  N  . 

(b)  Burt  (Reference  34)  suggests  along  the  same  line  a  procedure, 
which,  as  he  says,  is  more  satisfactory  by  not  using  residuals  but 
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converted  residuals  (see  Sokal's  test)  squaring  them  and  sunming  them 

up.  Then,  if  -N  is  large,  this  sum  will  be  approximately  distributed 

as  chi-squared.  So,  he  suggests  to  determine  significance  of  any 
particular  set  of  residuals  by  referring  to  the  X2-table  with 
i  n(n  -  1)  -  kn  +  |  k(k  =  1)  degrees  of  freedom. 


b.  Sannder's  test  _  , 

denote  the  sum  of  the  residuals  of  the  n  x  n  correlat 

matrix  after  k  factors  have  been  extracted.  Let  N  be  the  sample 
size  and  denote  by 


the  sum  of  all  n 
unrotated  matrix. 


n  k 

l  l  Mi 

j=i  i=i  3 

r.  '  i  InflrlfnP’fS  tflkfitl  flXJlO  th6 


The  test  can  take  on  two  forms:  If 


after  the  k*'' 
complete.  If 

by  tben 


factor  has  been  computed,  then  the  factor  extraction  is 
the  reliability  coefficients  of  the  variables  are  denoted 
the  test  can  be  stated  as:  If 


after  the  kth  factor  has  been  computed,  then  the  factor  extraction 
is  complete. 


Remarks : 

(a)  It  is  advisable  not  to  include  the  diagonal  residuals  in 
l  p2  unless  one  Is  sure  of  exceptionally  good  commonality  estimates 
If  the  cormunality  residuals  are  excluded  from  the  summation  one  has 
to  multiply  l  P2  by  -g-  to  bring  it  to  that  equivalent  with  a 

whole  matrix. 
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f 

.  .  j 

(b)  Saunder's  (Reference  41)  claims  his  formula  is  an  improve¬ 
ment  over  McNemar’s  test  since  it  takes  into  account  the  sample  size,  ‘  ■ 

the  number  of  variables,  the  reliabilities  and  especially  the  number 

of  factors. 

(c)  Sokal  (Reference  36)  applying  Saunder's  test  to  his  four 

i 

matrices  obtains  results  similar  to  those  obtained  by  McNemar's  j 

criterion.  He  again  finds  the  apparent  influences  of  large  sample  sizes 

i 

or  high  communalities  on  the  results.  i 

2.  Tests  for  Individual  Significance  of  Residuals.  ■ 

* 

a.  Test  by  means  of  standard  error  formula  for  the  final  residuals  j 

Two  approximate  standard  error  formulas  can  be  employed  to  decide  I 

upon  the  significance  or  insignificance  of  any  residual  after  any  - 

number  of  factors  has  been  extracted  from  the  original  correlation 

matrix.  j 

(1)  Theoretically  it  should  be:  R  =  AA'.  Extracting  common  j 

factors,  R  will  only  be  reproduced  by  AA'  approximately.  How 
good  this  approximation  is,  or  in  other  words,  how  complete  factorization 
is,  can  be  judged  on  the  basis  of  the  residual  matrix  R,  R  =  R  +  AA*. 

Each  element  of  R,  the  final  residual  correlations,  should  be  approxi¬ 
mately  zero  in  size;  since,  when  all  common  factors  have  been  extracted, 
no  further  correlation  should  exist  between  the  variables.  Let  us  assume, 
therefore,  that  the  distribution  of  the  residuals  is  similar  to  that  of 
a  zero-correlation  in  a  sample  of  equal  size.  Then  denote  by  the 

standard  deviation  of  the  series  of  residuals  and  by  o  „  the  standard 

r-o 

error  of  a  zero-correlation.  Under  the  above  assumptions  it  would  then 
be  necessary  as  a  test  for  completeness  to  determine  if 


o—  < 
r  - 


r=0 


_1 _ 

•ft  -  1 


or,  since  N  is  usually  large,  if 


< 


_1_ 

AT 


o 
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From  the  application  of  this  test-depending  on  the  sample  size  alone, 
which  is  rather  crude-one  may  conclude:  If 

1 


a—  > 
r 


N 


to  an  appreciable  extent:  further  linkages  between  variables  may 
exist;  further  factorization  may  be  necessary.  If 

1 


o—  < 
r 


/S~ 


to  an  appreciable  extent:  unjustified  linkages  between  variables  were 
factorized. 

The  above  test  can  be  found  in  Holzinger  and  Harman  (Reference  42) 
and  in  Harman  (Reference  2).  Applications  can  also  be  found  in  these 
texts.  Similar  formulae  have  been  proposed  by  Kelley  (Reference  43) 
and  Thurstone  (Reference  44). 

(2)  Holzinger  and  Harman  (Reference  42)  have  derived  a  standard 
error  formula  for  a  residual  after  any  number  of  factors  has  been 
extracted  from  the  correlation  matrix. 

the  observed  correlation  between  variable  i 


Denote  by  r„ 


and  j  ,  by 


"ij 


by 


and 


“is 


js 


the  residual  after  extraction  of  m+1  factors, 
the  standard  errors  of  factor  loadings,  then 


‘ij 


r. . 
il 


m  , 

I  t 

s=0  1 


is 


3s 


is 


This  formula,  however,  cannot  be  applied  to  a  residual  obtained  from 
any  solution  since  the  standard  errors  and  c|  are  only  known 

for  the  two-factor  and  bi-factor  solutions?3 

In  approximating  the  above  formula,  so  that  it  does  riot  explicitly 
contain  the  standard  errors  of  the  loadings,  the  assumption  is  made 
that  all  observed  correlations  can  be  well  enough  described  by  their 
average,  computed  by 


—  I 

4n) i>3 


(r- 


i,3  =  1, . .  * 


*  j) 


99 


And  if  p  denotes  the  average  residual  correlation  used  for  computing 
s 

loadings  of  the  factor  Fg,  the  approximate  formula  after  extraction 
of  m+1  factors  is  of  the  form; 


Z.  -  (1  -  p)2  (5  +  8p  +  2p2)  1  ?  i3 

OM  +  M  i  I  9 


Ks=ir 


ps  -  5“s 


These  standard  errors  are  tabulated  in  References  42  and  2.  Applications 
can  also  be  found  there. 


It  should  be  noted,  that  the  necessary  approximations  to  arrive  at 
above  formula,  make  the  a  -  -value  usually  smaller,  so,  in  order  to  take 
this  fact  into  account,  a  residual  which  is  twice  its  standard  error  can 
still  be  considered  insignificantly  different  from  zero. 
b.  Sokal's  Test 

In  the  following  test  each  single  residual  is  tested  for  insigni¬ 


ficance  . 
i  and 


Denote  by 
j  after 


p,.  ,  the  residual  correlation  between  variables 
i].k 

k  factors  have  been  extracted.  Let  u|^  denote 


the  uniqueness  of  variable  i  after  extraction  of  k  factors: 


Convert  the  residuals  to  quantities  analogous  to  partial  correlations 
(factors  through  k  kept  constant)  by  dividing  them  through  the  geometric 
mean  of  the  uniquenesses  of  the  variables  under  consideration.  Name  the 
converted  residuals  r . .  ,  _  .  .  So 

X j • It  •  • «K 


pij.k 
uik  ujk 


Assume  that  the  r. .  .  have  the  same  sampling  distribution  as 

l} • I*  •  #  •  K 

ordinary  partial  correlation  coefficients.  Under  this  assumption  test 

each  converted  residual  against  the  minimum  significant  partial  correlation, 

denoted  by  r  ,  obtained  from  table  IV,  Fisher  and  Yates  (Reference  45): 
m.s' 
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If 


p  < 

iq.l2...k  m.s 


at  a  presumed  level  of  significance  with  N  -  (k+1)  degrees  of  freedom, 
then  r..  ..  .  is  insignificant. 

I]  « xz  «  •  *K 

Remarks 

(a)  The  rather  laborious  work  to  conduct  the  test  on  each  residual 
can  be  simplified  by  excluding  certain  residuals  from  the  test.  This 
is  done  by  the  following  procedure:  for  a  presumed  significance  level, 
r2  can  be  determined  as  well  as  the  lowest  two  uniquenesses. 


denoted  by  mu?k  and  mu2  Then  from 


ut.  in. 
m  lk  m  ]k 


p..'  can  be  determined: 

13 


and  all  values 


r2  u2,  u2,  =  p2 

m.s  »  lk  m  ]k  m.s 


P .  •  1  <  P 

zj.k  m.s 


are  certainly  unsignificant.  That  means,  for  the  test  only  values 


p .  .  ,  >  p 

13  .k  m.s 


have  to  be  considered. 


(b)  Sokal  (Reference  36)  discusses  this  completeness  test  in  his 
comparative  study,  mentioning  also  some  computing  simplifications.  He 
obtains  his  results  by  judging  the  elements  of  the  residual  matrix  by  the 
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arovo  described  significance  test  and  by  the  "importance"  test,  that 
is  '  -by  counting  the  number  of  partial  correlations  larger  than  an 
arbitrary  0.05  (disregarding  the  sign),  that  remain  in  the  matrix 
after  k  factors  have  been  extracted,  naming  those  correlations 
important .  On  the  basis  of  his  study  he  recommends  these  procedures 
to  test  for  completeness  because  of  the  statistical  basis  of  the 
significance  test  and  the  apparent  consistent  results. 

(c)  In  one  of  his  early  tests  Burt  (Reference  46)  started  from 


the  same  considerations  as  Sokal,  defining  r..  .  and  testing 

X  J  •  1 .£  •  •  #K  ■  ^ 

it  against  the  standard  error  of  a  zero  partial  correlation,  — sr 

... ...  ,  nr* 

then  the  test  is  given  by 


rij.l2...k 


Pij-h  <  _1_ 
uik  ujk  '  ^r 


3.  ;Durt's  Chi-squared  Test  with  Z-transfornntlon 

Theoretically,  it  is  R  =  AA'.  Test  the  significance  of  the 
differences  between  the  elements  of  R  and  of  AA'  after  k  factors 
have  been  extracted.  Let  Z"  denote  the  elements  of  R  transformed 
by  Fisher's  Z- transformation  and  let  Z  denote  the  elements  of  AA', 
also  transformed  by  Fisher's  Z.  Sum  (Z-Z)2  over  the  upper  or  lower 
triangles  (without  diagonals)  of  the  respective  matrices.  If  K  is  the 
sample  size,  n  the  number  of  variables  the  test  of  significance  is 
expressed  by:  If 


X2  =  (N  -  3)  [  (Z-Z)2 

with  |n  (n  -  1)  -  kn  +  Jk  (k  -  1)  degrees  of  freedom  is  insignificant 
at  a  presumed  level  of  significance,  the  factor  extraction  is  assumed 
to  be  completed. 

Remarks: 

(a)  Fisher's  Z-transformation,  Z=tanh"  r  =  |logg  pjr  »  is  applied 
to  the  elements  of  R  arid  AA*  to  obtain  their  normal  distribution 

(b)  Burt  recommends  this  test  in  his  1952  paper  (Reference  )  as 
the "rndst  useful  available  when  current  factorial  procedures  are  employed" • 


Sokal  (Reference  36)  in  his  comparative  study  obtains  some  correct 
results  and  points  out  the  fact  that  small  correlation  matrices  may 
not  provide  enough  degrees  of  freedom. 

4.  Lawley's  Chi-squared  Test 

In  the  following  we  will  consider  a  statistical  test  for  the 
number  of  common  factors.  This  test  should  be  used,  though,  for 
large  samples  only  and  with  ones  in  the  main  diagonal  of  the  corre¬ 
lation  matrix. 

Let  N  denote  the  sample  size,  |R|  the  determinant  of  the 
matrix  of  observed  correlations  and  |p|^  the  determinant  of  the 
maximum  likelihood  estimator  (P  =  AA  +  a2  where  factor  loadings 
are  determined  by  'the  maximum  likelihood  method  )  of  the  population 
correlation  matrix.  Let  the  variables  have  a  multivariate  normal 
distribution .  Then 


chi-square 


(4) 


with 


v  =  A-  [(n  -  k)2  -  n  -  k] 

2 

degrees  of  freedom  is  used  to  test  the  hypothesis  that  k  common 
factors  adequately  explain  the  correlations  at  an  assumed  level  of 
significance. 

Lawley  (Reference  42),  who  derived  the  above  formula,  simplified 
it,  by  approximation  to  the  following  x-  formula  to  be  examined: 

n 

v2  =  N  y  ■  ■  i-;—  (5)  (corrected 

i<j=l  ai  aj  residuals) 


where 


r..  denote  the  residuals  obtained  by 


with  r! .  being  the  elements  of  P  ,  that  Is  the  (maximum  l.i.-.eii;iood 
estimated)  reproduced  correlations. 
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Remarks 


(a)  Harman  (Reference  2)  states  that,  usually,  that  is  by  other 
than  statistical  means,  one  underestimates  the  number  of  statistically 
significant  factors,  compared  with  the  number  of  factors  one  obtains 
by  application  of  the  x-test • 

(b)  As  Harman  points  out,  it  is  reasonable  to  apply  the  test 

also  to  problems  where  the  maximum  likelihood  method  is  not  employed 
to  estimate  p  if  one  draws  only  a  conclusion  in  the  case  where  the 
X.?value  is  found  to  be  insignificant.  In  case  the  x-vaiua  is  signifi¬ 
cant,  though,  no  conclusion  can  be  made  since  it  is  possible  that  a 
maximum  likelihood  factorization  gives  better  results. 

(c)  Rippe  (Reference  48)  arrives  at  a  formula  identical  with  the 
•likelihood, ratio  (equation  4),  his  development  not  being  specifically 
dependent  on  maximum  likelihood  estimates  of  factor  loadings. 

(d)  An  experimental  study  of  the  test  was  furnished  by  Henrysson 
(Reference  49). 


C.  Miscellaneous  Tests  for  Ooftpleteness. 


1. 


Index  of  Completeness  of  Factorization 
If  the  uniqueness  of  a  variable  Xj 


unreliability 


c|  and  specificity 


is  broken  down  in 
that  is 


bf  +  1 


» 


then  the  index  of  completeness  of  factorization  is  defined  by 

10°  h2 

■  - i- —  (h?  the  communality) . 

h,  -  b2  0 

J  i 

This  index  can  well  be  used  to  decide  whether  factorization  was  carried 
too  far  cr  not;  for  almost  no  variable  should  H^  be  in  excess 

of  100.  Especially  in  the  analysis  of  psychological  tests  into  cororon 
factors,  this  analysis  should  not  be  carried  to  the  point  where  real 
specific  factors  disappear. 

2.  Listing  cf  Other  Completeness  Tests 

There  does  exist  a  variety  of  other  methods  for  checking  completeness 
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of  factorization.  Since  they  are  partly  simple  ones  from  the  early 
days  of  factor  analysis  or  do  not  have  important  effects  on  factor 
analysis,  we  will  indicate  them  here  only,  referring  to  the  papers 
where  they  can  be  found. 

(1)  Plotting  the  distribution  of  the  residuals  after  extraction 
of  k  factors  and  comparing  this  distribution  with  the  normal 

curve  is  described  in  Cattell  (Reference  24,  pp.  297-298)  as  complete¬ 
ness  check. 

(2)  See  Mosier  (Reference  33)  for  a  comparison  of  six  simple 
methods.  A  short  description  of  three  of  these  methods,  which  were 
found  to  be  rather  effective,  is  given  in  Cattell  (Reference  24) . 

(3)  See  Reyburn  and  Taylor  (Reference  50)  for  a  method  which 
compares  the  frequency  distribution  of  the  quotients  of  a  residual 
over  the  standard  error  of  its  corresponding  original  correlation 
with  the  normal  distribution. 

(4)  Coombs  (Reference  51)  suggests  a  test  for  the  centroid 
solution  by  counting  the  number  of  negative  signs  left  in  the 
residual  matrix  after  every  possible  variable  reflection  has  been 
carried  out  and  compares  them  with  the  number  C  of  a  table  set 
up  by  Coombs,  which  depends  on  the  number  of  variables. 

(5)  Swineford  (Reference  52)  correlates  the  original  correlations 
with  the  series  of  corresponding  residuals  and  continues  factorization 
until  this  correlation  becomes  insignificant. 

(6)  Hoel  (Reference  53)  attempts  in  his  paper,  less  fruitfully 
though,  the  development  of  a  significance  test  for  the  number  of 
common  factors.  See  also  Burt  (Reference  34)  for  a  short  outline 
of  the  method. 

(7)  Wilson  and  Worcester  (Reference  54)  describe  a  chi-squared 
test . 

(8)  Young  (Reference  55)  derives  an  index  of  clustering. 

(9)  In  the  situation  where  we  are  dealing  with  component 
analysis  (unities  are  employed  in  the  main  diagonal  of  the  correlation 
matrix)  Hotelling  (Reference  56)  and  Bartlett  (Reference  57)  have 
provided  statistical  tests  for  the  number  of  significant  factors. 

(10)  Humphrey  (see  Fruchter,  Reference  35)  defined  a  completeness 
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criterion  which  takes  into  account  the  sample  size  and  depends  on 
the  loadings  of  only  two  variables.  He  multiplies  the  two  highest 
loadings  in  a  column  of  the  centroid  factor  matrix  and  compares  the 
product  with  the  standard  error  of  the  zero  correlation  coefficient 
to  establish  the  significance  or  insignificance  of  the  factor  under 
consideration. 

(11)  It  is  noted  for  information  that  there  exists  a  listing  of 
twenty -five  completeness  criteria  by  Vernon,  et  al,  (Reference  58). 

4.6  EIGENVALUES  AND  THEIR  BOUNDS. 

A  critical  problem  in  factor  analysis  is  the  determination  of  the 
sample  size,  denoted  by  the  number  of  observations  N  .  This  problem 
can  be  seen  with  respect  to  direct  dependence  of  N  on  the  number  of 
variables  or  with  respect  to  the  factor  analysis  one  wants  to  conduct. 
The  question  for  the  dependence  of  the  number  of  observations  on  the 
number  of  variables  is  answered  by  factor  analysts  by  such  rules  of 
thumb  as:  the  ratio  of  the  number  of  observations  to  the  number  of 
variables  shall  exceed  3  (or  shall  exceed  5);  the  number  of  observations 
minus  the  number  of  variables  shall  exceed  80.  No  good  mathematical 
means  has  as  yet  been  obtained  for  a  better  determination  of  this 
relationship.  One  indication  of  this  relationship  can  be  exhibited, 
however.  On  a  geometrical  basis  (see  2.4)  one  finds  that,  if  n  = 
number  of  variables,  m  =  supposed  number  of  common  factors  and  the 
factors  are  considered  to  be  uncorrelated,  then  the  m  common  factors 
and  n  unique  factors  are  represented  in  N-space  such  that  m  t  n  s  N, 
which  determines:  N  >  m  +  n. 

The  investigation  reported  in  this  subsection  takes  the  second  way 
of  approach  to  the  problem,  namely  to  consider  the  sample  size  N  in 
the  light  of  the  factor  analysis  to  be  conducted.  In  considering  at 
all  the  problem  of  how  large  the  sample  should  be,  we  are  assuming, 
that  if  we  would  arbitrarily  choose  an  N  without  reflecting  upon 
anything,  we  might  obtain  less  "reliable"  factors.  Here  we  want  to 
understand  by  a  reliable  factor  a  factor  whose  loadings  would  change 
only  little  if  the  factor  analysis  would  be  conducted  on  a  correlation 
matrix  of  the  same  variables  but  with  a  larger  number  of  observations. 


The  solution  to  the  problem  was  attempted  to  be  found  in 
statistical  properties.  Two  assumptions  had  to  be  made:  firstly, 
the  assumption  that  all  elements  r^  of  the  correlation  matrix  R 
be  greater  than  0,  denoted  by  R  >  0,  and  secondly,  the  assumption 

that  the  population  of  pairs  i  =  1,...,  N  ,  from  a  sample 

of  which  each  element  of  R  is  computed,  satisfies  the  bivariate  normal 
distribution  model.  The  first  assumption  is  not  so  stringent,  since 
many  correlation  matrices  with  small  negative  entries  can  be  reduced 
to  this  form,  the  second  assumption  is  one  which  is  mostly  made  to 
guarantee  statistical  considerations  on  r...  The  case,  where  some 
elements  of  R  are  equal  to  zero,  can  be  considered  also,  if  only 
R  satisfies  the  "irreducibility"  properties,  which  will  be  intro¬ 
duced  a  little  later. 

The  statistical  means  to  associate  sample  size  N  with  the  loadings 

of  the  factors,  obtained  by  factor  analyzing  the  correlation  matrix  R, 

is  found  in  the  confidence  intervals,  which  one  can  compute  for  each 

element  r.^  of  R  .  By  forming  confidence  intervals  we  assume  that 

the  observed  correlation  coefficients  are  only  estimates  of  the  true 

population  correlation  coefficients.  The  larger  N  ’is,  the  more  does 

the  observed  coefficient  approach  the  population  coefficient,  so  that 

the  difference  between  the  observed  r^  and  the  confidence  limits 

can  be  called  the  error  due  to  N.  Now  we  are  interested  in  how  these 

errors  propagate  through  the  factor  analysis.  Since  the  most  popular 

method  for  obtaining  a  factor  analysis  of  R  is  the  principal- factor 

method,  where  the  factor  loadings  are  directly  computed  frcm  eigenvalues 

of  the  correlation  matrix,  the  question  we  ask  is  the  following:  How 

much  does  the  error,  introduced  into  the  correlation  matrix  R  by  way 

of  the  fact  that  the  elements  of  R  are  only  N-dependent  estimates  of 

the  true  correlation  coefficient,  influence  the  eigenvalues  of  R? 

To  obtain  information  about  this,  the  following  procedure  is  suggested. 

For  each  r.^  confidence  limits  are  computed  according  to  the  technique 

outlined  in  Section  2.5.  For  each  r^,  we  obtain  two  confidence  limit- 

values,  which  we  denote  by  and  r.52^  uith  r.5*^<  r.,  <  r.f2^ 

3k  3,;  3k  jk  3k 

If  an  r.^  is  computed  to  be  insignificantly  different  from  zero,  we 

insert  the  value  0.001  (or  if  there  is  an  r..  <  0.001,  an  even  smaller 
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value  than  0.001)  for  it  into  Rj  (since  we  do  not  want  any  actual 
zero  values  in  R  );  the  value  r..  itself,  however,  is  inserted 

1  jK 

in  R2.  Expressing  the  above  in  matrix  notation  we  obtain  if 
Pjk  e  e  R,  and 

principal- factor* analysis  on  the  three  matrices  Rj,  R,  and  Rj. 

We  encounter  some  difficulties  here.  If  we  assume  ones  in  the  main 

diagonals  of  R,  Rj,  and  R2,  then  R  is  Gramian,  while  and 

R2  are  symmetrical  but  not  necessarily  positive  semidefinite.  On 
the  other  hand,  seldom  is  a  factor  analysis  done  on  R  with  ones  in 
the  main  diagonal;  rather  squared  multiple  correlations  or  other 
communality  estimates  are  inserted  in  the  diagonal.  So,  also  R 
differs  slightly  from  being  Gramian.  How  bad  it  is  non-Gramian 
is  determined  by  the  number  and  size  of  negative  eigenvalues.  If 
they  are  small  and  few  in  number  they  can  be  neglected.  We  make 
use  of  this  fact  for  the  eigenvalues  of  Rj  and  R2.  If  N  is 

large,  R(  and  R2  approximate  R  closely,  so  that  they  will  not 

be  too  non-Gramian. 

Under  the  assumption  that  R  >  0  also  Rt  >  0  and  R2  >  0. 

This  is  based  on  the  fact  that  the  confidence  intervals  for  each 
element  of  R  do  not  exceed  over  the  zero  point.  If  they  would 
exceed  over  the  zero  point,  the  population  correlation  coefficient 
could  be  zero.  But  this  is  excluded  from  consideration  since  each 
correlation  coefficient  is  first  tested  for  this  hypothesis  and  the 
confidence  limits  are  only  computed  if  the  population  coefficient  is 


r(£)  e  R2;  Ri  <  R  <  R2.  Then  we  conduct 


not  equal  to  zero. 

Thus,  since  Rj  <  R  i  R2  and  R^  >  0,  R2  >  0  we  can  express 
Rl  and  R2  as:  Rj  =  R-Ej,  and  R2  =  R+Ez ,  respectively, 
where  Ej  has  only  positive  entries  and  E2  has  positive  and 
(or  only)  zero  entries. 

Our  objective  will  now  be  to  show  the  following  :  If  r^)  and 
r^)  represent  the  lower  and  upper  95%-confidence  limits  on  the 
correlation  coefficient  r.^,  by  having  defined  the  r^*)  and  r|.*} 
values  if  r^  is  insignificant  as  above;  such  that  rjj£)  <  r^  i 
and  if  r(*)e  Ri  >  0,  r.^  e  R  >  0,  and  r^)  e  R2  >  0,  then 

*1  <  k  <  V,  where  Xj,  X,  and  X2are  the  largest  eigenvalues  obtained 
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for  the  matrices  Rj,  R,  and  R2. 

Since  the  loadings  of  the  first  factor  are  directly  computed 

from  the  largest  eigenvalue,  the  result,  which  we  will  prove  below, 

clearly  links  the  loadings  (by  which  we  judge  a  factor  analysis  to 

be  reliable)  to  the  sample  size  N:  the  larger  N  is,  the  smaller 

will  the  interval  (rV1),  r( 2 )  )  be,  and  correspondingly  the 
J*'  3*' 

interval  (XlSX2). 

Now  let  us  prove  the  statement  Xj  <  \  <  X2 (under  the  above 
made  assumptions).  As  we  pointed  out  earlier  in  this  subsection, 
we  can  make  the  assumption  R  >.  0,  but  then  R  has  to  satisfy  the 
irreducibility  condition  introduced  by  the  following 

Definition  4.1:  For  n  >  2  an  n  x  n  matrix  R  with  real 
elements  is  called  reducible  if  there  exists  an  n  x  n  permutation 
matrix  P  (defined  as  a  square  matrix  which  in  each  row  and  in 
each  column  has  some  one  entry  unity,  all  others  zero),  such  that 


where  R;  j  is  an  r  *  r  submatrix  and  R2,2  is  an  (n  -  r)  *  (n  -  r) 

* 

submatrix  with  is  r  <  n.  If  no  such  permutation  matrix  exists, 
then  R  is  called  irreducible .  If  R  is  a  lxl  matrix,  then 

R  is  irreducible  if  its  single  entry  is  nonzero  and  reducible  other¬ 
wise  . 

In  the  proof  of  our  statement  we  will  have  to  use  either  one  of 
two  theorems,  according  to  the  assumptions  made  on  R  .  If  R  >  0, 
we  shall  use  Perron's  Theorem  (Theorem  3.8),  if  R  i  0  and  R  is 
irreducible  we  shall  use  the  following  Theorem  4.4,  due  to  Froebenius, 
an  extension  of  Perron's  Theorem  to  irreducible  matrices. 

Theorem  4,4:  An  irreducible  matrix  R  >  0  always  has  a  positive 
eigenvalue  X  which  is  a  simple  root  of  the  characteristic  equation. 

The  moduli  of  all  other  characteristic  numbers  are  at  most  X  .  The 
eigenvector  corresponding  to  X  has  positive  components  and  is  essenti¬ 
ally  unique  (up  to  scale  factors ) . 
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The  proof  of  Froebenius'  Theorem  can  also  be  found  in  Gantmacher 
(Reference  22). 


Proof  of  the  statement  A  <  X  <  X2»  where  Aj  is  the  largest 

eigenvalue  of  ,  A  the  largest  of  R  ,  and  A^  is  the  largest  of 

R  .  If  x  is  an  eigenvector  belonging  to  X,  and  x,  is  an 

2  * 
eigenvector  belonging  to  X1  we  have 


(6) 


Vi 


Xlxl 


(7) 


He  have 


Rj  =  R  -  Ej  ,  >  0, 


and  taking  the  inner  product  of  Equation  7  with  x  we  obtain: 

(xlRjXj)  =  X1(xjx1) 

(R1x|xJ)  =  X^xlxj)  since  Rj  is  real  and  symmetric 
[(R  -  Ej)  x|xjj  =  Aj(x|xj) 

(Rx|xj)  -  (Ejxlxj)  =  Xjixlxj) 

X(x|x1)  -  (Ejxlxj)  =  Xj(x|xj) 

X  -  =  •y 

lx|Xj)  1 

(XIXI )  is  not  equal  to  zero,  since  x  and  xj  have  only  positive 
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components,  according  to  Perron's  (or  in  case  it  is  applicable, 
Froebenius*)  Theorem,  applied  to  R  and  Rj.  So  the  term 
(Eixjxj)/(x|xi)  is  positive  and  thus  li  <  1.  In  the  same 

manner  it  is  proved  that  I  <,  1? ,  with  the  equality  holding  if 

E2  is  the  zero  matrix,  which  makes  (E2x|x2)/(x| Xj)  equal  to  zero. 

Some  remarks  about  the  result  shall  be  made  next.  From  the 
analysis  of  the  principal- factor  method  it  follows  that: 


and 


n 

Xl  =  £'u;aji  » 
1=1  J 


X  =  [  a? 

j=l  « 


X2  =  t  ,  ,..a? 


or,  the  largest  eigenvalues  of  Rj ,  R,  and  R2  are  equal  to  the 
sum  of  the  contributions  of  the  first  factor  (in  each  respective  factor 
analysis)  to  the  total  communality  of  each  analysis.  The  length  of  the 
interval  for  X  ,  namely  the  difference 

n  n  n  , 

X2  -  Xj  =  F  a?  -  1  a?  =  J  I  a?  -  ,  .a?  1 

1  ^3.(2)  3»  j=i(D  1(2)  31  (1)31  | 


is  the  largest  difference  which  we  can  get  between  the  sum  of  the 
squared  factor  loadings  of  the  two  first  factors,  obtained  by  factor 
analyzing  Rj  and  Rg.  The  difference  approaches  zero  when  N 
increases,  since  the  length  of  the  interval  for  X  then  becomes 
smaller. 

It  was  wished  to  determine  the  sample  size  N.  The  difference 

j?1(2)ajr<i)aji 

indicates  how  much  the  contribution  of  the  first  factor  can  vary  in 
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dependence  on  N.  In  other  words,  it  can  be  checked  whether  an 
assumed  N  is  large  enough  so  that  the  variation  of  the  first  factor 
contribution  to  the  total  comrounality  of  R  does  not  exceed  a  given 
value  (which  could  perhaps  be  computed  as  a  percentage  of  the  total 
communality) . 


Example:  An  example  was  computed  to  show  the  proposed  method. 

The  problem  of  24  psychological  variables,  whose  correlation  matrix 
and  analysis  are  reported  in  Harman  (Reference  2,  page  137  and  page  185) 
was  taken  for  this  example.  The  one  insignificant  negative  value, 
which  Harman's  matrix  contains,  was  changed  to  an  insignificant  positive 
one  in  order  to  meet  the  requirements  for  application  of  Perron's 
Theorem.  The  sample  size  as  given  in  Harman  is  N=145. 

Let  us  briefly  outline  the  kind  of  computations  done  for  the 
example . 

(1)  The  two  matrixes  Rj  and  R2  were  computed  according  to 
the  discussion  in  this  subsection.  The  value  0.001  was  inserted  into 

Rj  ,  while  the  values  r.^  themselves  were  inserted  into  R2  when 
r^  was  found  to  be  Insignificant. 

(2)  Squared  multiple  correlation  coefficients  were  computed  for 
the  three  matrices. 

(3)  Factor  analyses  were  conducted  on  the  matrices  R^,  R,  and 
R2 .  The  eigenvalues  and  factor  loadings  were  obtained. 

For  a  comparative  study  let  us  now  consider  the  obtained  values. 

We  list  the  postive  eigenvalues  in  Table  4  and  then  the  first -factor 
loadings,  computed  from  the  3  first  (largest)  eigenvalues  in  Table  5. 

It  is  also  interesting  to  list  the  following  data: 


Total  Original 
Communality 


Sum  of  Positive 
Eigenvalues 


Sum  of  Negative 
Eigenvalues 


R1 

7.9184 

9.8388 

R 

11.8761 

13.4935 

r2 

21.6238 

22.0108 

1.9204 

1.6174 

0.3870 
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Table  4 


The  Positive 

Eigenvalues  of  Rj,  R,  and 

r2 

Rjeigenvalues 

R-eigenvalues 

Rjeigenvalues 

Xj  =  4.3884 

X  =  7.6665 

X2  =  10.8149 

1.6844 

1.6634 

2.2210 

1.1014 

1.1785 

1.6392 

0.8292 

0.9212 

1.4155 

0.4342 

0.4319 

0.9245 

0.3608 

0.4064 

0.8684 

0.2811 

0.3199 

0.6861 

0.2512 

0.3024 

0.6596 

0.2136 

0.2513 

0.5467 

0.1482 

0.1759 

0.4625 

0.1084 

0.1082 

0.3760 

0.0379 

0.0433 

0.3235 

0.0246 

0.2860 

0.2108 

0.1943 

0.1598 

0.1358 

0.0576 

0.0286 

Interpreting  the  obtained  results,  the  following  can  be  said: 

(1)  The  3  matrices  can  be  considered  as  not  too  non-Gramian, 
the  size  of  the  negative  eigenvalues  being  small.  Especially,  the 
number  and  size  of  the  negative  eigenvalues  of  R2  are  small.  Here, 
though,  a  difficulty  arose  when  a  squared  multiple  correlation  co¬ 
efficient,  as  estimate  of  communality,  turned  out  to  be  larger  than 

one  (based  on  the  fact  that  with  ones  in  the  main  diagonal 

T 

has  not, as  R  does,  the  representation  as.  R  =  ZZ  /N). 

(2)  Table  5  shows  the  expected  results  that  all  factor  loadings 
of  the  three  first  factors — as  derived  from  positive  eigenvalues  and 
eigenvectors — are  positive. 

(3)  It  is  interesting  to  note,  that  both  Rj  and  R  show  four 
distinctively  large  eigenvalues  while  there  is  a  sharp  drop  in  the 
size  of  the  eigenvalues  after  the  fourth  ones.  Each  time  the  four 
eigenvalues  account  for  more  than  95%  of  the  total  original  communality 
Harman  suggests  the  interpretation  of  four  factors,  which  is  applicable 
to  the  results  of  Rj  .  Rj  shows  six  distinctive  eigenvalues  with 
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Table  5 


The  First-Factor  Loadings  Computed 
from  the  Three  First  Eigenvalues 


Xl  =  4.3884 

X  =  7.6665 

X2=  10.8149 

factor  loadings 

factor  loadings 

factor  loadings 

0.4293 

0.5952 

0.7012 

0.1594 

0.3751 

0.4784 

0.2222 

0.4297 

0.5637 

0.3130 

0.4839 

0.5906 

0.6337 

0.6901 

0.7729 

0.6336 

0.6883 

0.7620 

0.6291 

0.6728 

0.7407 

0.5930 

0.6819 

0.7595 

0.6563 

0.6898 

0.7540 

0.2744 

0.4649 

0.5586 

0.3711 

0.5588 

0.6725 

0.2643 

0.4669 

0.5873 

0.4308 

0.6038 

0.7181 

0.2016 

0.4268 

0.5544 

0.1465 

0.3896 

0.5334 

0.2897 

0.5144 

0.6395 

0.2327 

0.4631 

0,6180 

0.2877 

0.5177 

0.6614 

0.2284 

0.4511 

0.5702 

0.4831 

0.6164 

0.7288 

0.4252 

0.5969 

0.7285 

0.4746 

0.6129 

0.7229 

0.5827 

0.6895 

0.7876 

0.5174 

0.6532 

0.7621 

four  of  them  being  over  one.  But  one  has  to  consider  11  eigenvalues 
Jo. account  for  95%  of  the  original  total  communality ,  while  4  (6) 
eigenvalues  account  for  slightly  more  than  70%  (80%)  of  the  original 
total  conmunality . 

(4)  As  for  the  main  objective,  the  determination  of  N,  the 
result  shows  that  N  =  145  is  too  small  to  furnish  a  reliable  factor 
analysis.  Already  the  confidence  intervals  are  very  large.  For 
example: 

0.013  <  0.176  <  0.330  for  a  small  r.. 
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0.635  <  0.723  <  0.793  for  a  large  r.. 

3* 

The  interval  on  the  first  eigenvalue  is  consequently  also  large: 

4.3884  <  7.6665  <  10.8149,  so  that  the  difference 

X2  -  Xi  =  10.8149  -  4.3884 

=  6.4265 

is  even  not  expressible  as  a  reasonably  small  percentage  of  the 
original  total  communality  of  R. 

Disregarding  R2  (for  its  difficulties  of  obtaining  communalities 
larger  than  one)  and  considering  only  R  and  Rj  we  compute 
X  -  Xj  =  3.2781  which  is  27.6%  of  the  original  total  communality 

11.8761,  still  considerably  high. 

It  must  be  concluded,  that  the  sample  size  N  =  145  is  too  small 
and  it  would  be  desirable  to  have  more  observations  and  to  do  the 
factor  analysis  over.  On  the  other  hand,  both  the  Rj  and 
R-analysis  yield  the  same  number  of  factors  used  for  interpretation, 
which  might  suggest  the  contrary.  This  emphasizes  the  fact,  which 
also  Harman  indicates,  that  proper  statistical  considerations  are 
often  lengthy  but  do  not  furnish  better  results . 

h.7  FACTOR  SCORES 

The  computational  problem  of  representing  observed  variables 
in  terms  of  hypothetical  variables  or  factors  F  is  only  partly 
solved  when  the  factor  loadings  A  are  computed .  The  factor  loadings 
serve  to  describe  the  number  of  factors  and  the  saturation  of 
variables  by  a  factor.  And  for  some  purposes,  such  as  interpretation 
of  factors,  the  loadings  are  sufficient.  However,  the  complete 
representation  is  obtained  only  when  the  factors  themselves  are 
also  computed. 

In  the  case  where  the  factor  pattern  takes  the  form 

Z  =  AF 
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due  to  inserting  unities  on  the  diagonal  of  the  correlation  matrix, 
i.e.,  no  unique  factors  are  allowed  or  postulated,  the  common 
factors  F  may  be  solved  for  directly  since  the  matrix  A  is  a 
square  n*n  nonsingular  matrix.  Indeed 

F  =  A"*Z  . 

When  communalities  are  placed  on  the  diagonal  of  the  correlation 
matrix,  the  number  of  common  and  unique  factors  is  greater  than  the 
number  of  variables,  and  therefore  the  factor  loading  matrix  is 
singular  with  no  inverse.  In  keeping  with  the  original  assumption 
of  factor  analysis  that  each  variable  is  a  linear  function  of  the 
factors,  it  is  now  assumed  that  each  factor  is  a  linear  function  of 
the  variables.  However,  since  there  are  more  factors  than  variables, 
the  factors  defined  by  the  original  linear  form  can  only  be  estimated 
in  a  least  squares  sense  by  the  linear  form, 

Fp  =  ^piV*"+  8pnZn  <P  =  1.2...., m>  . 

It  is  shown  (Reference  2,  p.  340)  that 

F  =  STR  lZ  (8) 

P  P 

gives  least  squares  estimates  of  the  factors ,  where  the  subscripts 
denote  columns .  Factors  estimated  using  Equation  8  have  zero  mean 
and  a  standard  deviation  close  to  one  but  varying  from  factor  to 
factor . 
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Section  V 

THE  ROTATION  PROBLEM 

5.1  INTRODUCTION 

After  we  have  discussed  factor  solutions  and  problems  pertaining 
to  them  the  next  step  in  factor  analysis  is  rotation.  This  problem 
is  therefore  considered  in  this  section.  In  5.2  the  rotation  problem 
will  be  stated.  5.3  gives  a  survey  of  existing  rotation  techniques. 

A  specially  interesting  problem  is  the  problem  of  interpreting  oblique 
factors.  Many  factor  analysts  prefer  to  keep  to  orthogonality  since 
the  problems,  raised  by  the  fact  that  in  the  oblique  rotation,  factor 
pattern  and  factor  structure  are  no  longer  equal,  cannot  satisfactorily 
be  taken  care  of.  On  the  other  hand,  an  oblique  solution  might  be  the  only 
adequate  solution  to  a  given  problem.  Therefore  the  important  topic  of 
interpretation  of  oblique  factors  is  taken  up  in  5.4. 

5.2  THE  ROTATION  PROBLEM 

The  second  part  of  every  worthwhile  factor  analysis  is  factor 
rotation.  This  procedure  involves  accepting  a  factor  pattern  (and  other 
matrices  in  the  oblique  case)  with  an  already  determined  number  of  factors 
and  performing  sequence  of.  iterative  matrix  operations  on  it  to  re-orient 
the  factor  reference  frame  according  to  preset  boundary  conditions 

or  constraints.  The  basic  correlation  matrix  with  communalities  is 

T 

preserved  and  must  still  be  the  result  of  FF  (in  model  form). 

There  is  an  Infinite  number  of  ways  to  rotate  the  primary  factor 
pattern  which  results  from,  say,  a  centroid  or  principal  components 
analysis.  Consider  for  a  moment  the  analogy  of  defining  the  loci  of 
points  equidistant  from  the  origin  of  a  Cartesian  coordinate  system, 
each  point  simply  representing  another  orientation  of  the  end  of  line  in 
2-space.  A  graphical  illustration  of  a  typical  two- factor  rotation  where 
the  variables  are  represented  by  points  in  a  plane  is  shown  below. 

The  rotation  of  the  reference  frame  to  a  "preferred"  or  "Simplified" 
position  is  both  difficult  and  ambiguous.  It  is  this  process  which  is  the 
cause  of  much  controversy  concerning  the  definition  of  a  preferred, 
simplified  or  best  solution.  There  have  been  and  still  are  several 
schools  of  thought  on  this  issue  dating  back  to  the  origin  of  factor 
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analysis.  The  most  popular  definition  is  one  developed  by  the 
psychologists  and  is  used  extensively  today — simple  structure.  Other 
structures  which  are  used  from  time  to  time  include  multiple-group, 
uni-factor,  and  bi-factor  and  are  characterized  by  a  preset  factor 
pattern  into  which  the  loadings  are  to  be  fitted.  Simple  structure, 
on  the  other  hand,  represents  a  quasi-definite  ordering  of  a  desirable 
multiple-factor  solution  based  on  factor  interaction  experience  of  the 
bohavorial  scientists.  The  resultant  pattern  initially  was  one  containing 
mostly  very  high  and  very  lew  loadings  distributed  in  such  a  way  that 
the  following  tnree  conditions  were  met: 

1.  each  row  should  contain  at  least  one  zero 

2.  each  column  should  contain  at  least  as  many  zeros  as  there 
are  common  factors 

3.  for  every  pair  cf  factors  there  should  be  at  least  m  variables 
which  do  not  load  high  on  both  factors  (  m  being  the  number  of  common  factors) 
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These  conditions  were  first  established  by  Thurstone  (Reference  44) 
and  later  extended  by  him  to  provide  "insurance" in  his  own  studies  as 
follows  (Reference  25): 

1.  each  row  should  contain  at  least  one  zero 

2.  each  column  should  contain  at  least  as  many  zeros  as  there  are 
common  factors 

3.  for  every  pair  of  factors  there  should  be  several  variables 
which  do  not  load  high  on  both  factors 

4.  for  every  pair  of  factors  a  large  proportion  of  variables 
should  have  zero  loadings  on  both  factors  when  there  are  more  than  three 
factors 

5.  for  every  pair  of  factors  there  should  be  only  a  small  number 
of  variables  with  nonzero  loadings  on  both  factors. 

If  simple  structure  s  decided  to  be  the  acceptable  format  for  a 
factor  pattern ,  one  may  choose  from  several  factor  rotation  techniques ,  each 
of  which  provides  a  slightly  different  variation  of  the  main  theme.  If 
another  factor  structure  is  desired,  rotation  may  be  exceedingly  complex 
if  not  impossible! 

5.3  SURVEY  OF  ROTATION  TECHNIQUES 

In  general  there  are  two  distinct  categories  of  factor  rotation — 
orthogonal  and  oblique — which  differ  widely  both  in  concept  and  inter¬ 
pretation.  The  idea  of  strictly  uncorrelated  factors  in  the  orthogonal 
structure,  whether  simple  structure  or  not,  has  contributed  significantly 
to  the  extensive  usage  of  the  orthogonal  solution  in  a  final  analysis. 

Simply  summing  the  squares  of  all  the  factor  loadings  for  any  given  variable 
yields  its  common  factor  variance,  thus,  the  importance  of  an  individual 
loading  is  easily  determined.  This  is  not  at  all  the  case  in  oblique 
factor  structures  where  nonzero  correlations  among  factors  necessitate 
rather  tedious  matrix  manipulations  which  heavily  tax  the  skills  and 
patience  of  the  user.  A  simpler  method  to  determine  factor  significance 
is  not  yet  known  but  the  problem  is  considered  later  on  in  this  section. 

It  is  indeed  unfortunate  that  interpertation  ease  has  dictated  the 
unquestioned  popularity  of  the  orthogonal  methods  since  the  shortcomings 
of  a  linear  model  are  confounded  by  a  further  restriction  of  uncorrelated 
factors.  A  more  realistic  model  (naturally  there  are  many  problems  which 
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fall  into  the  "straightforward"  class  whereby  all  of  the  common  factor 
variance  can  be  accounted  for  by  the  first  few  factors  and  furthermore 
can  be  interpreted  as  definite  orthogonal  physical  factors)  in  the'  physical 
world  is,  of  course,  the  oblique  factor  structure  if  the  intent  of  the 
analyses  is  one  of  discovering  physical  entities.  In  data  reduction 
problems  the  orthogonal  patterns  are  quite  acceptable. 

Helpful  in  the  selection  and  comparison  of  simple  structure  rotation 
techniques  is  Table  6  extracted  in  part  from  Harman  (Reference  2,  p.  310) 
where  short  expressions  for  Quartimax  and  Varimax  orthogonal  rotation 
techniques  and  Oblimax,  Quartimin,  Covarimin,  Oblimin,  and  Kaiser- Dickman 
oblique  rotation  techniques  are  given.  The  following  notation  is  adopted 
for  the  table: 

(a.  )  =  initial  factor  matrix, 

IP 

(b.  )  =  final  factor  matrix, 

3P  * 

(v^)  =  final  factor  structure  matrix. 

It  should  be  noted  that  major  differences  in  these  techniques  occur  both 
in  concept  of  a  "best"  simple  structure  and  in  computation  procedures.  The 
orthogonal  rotation  problem  is  pretty  well  resolved  by  Varimax,  Quartimax 
at  best  being  a  good  estimate.  The  oblique  techniques  require  enormous 
computation  efforts  and  generally  result  in  "not  quite"  solutions  which 
call  upon  Cattell's  Maxplane,  or  Rotoplot,  for  polishing. 
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5.4  INTERPRETATION  OF  OBLIQUE  FACTORS 
A,  Introduction 

The  factor  analysis  model  involves  the  simultaneous  linear  descrip¬ 
tion  of  n  variables  by  m  common  factors  and  n  unique  factors. 


Z1  =  aUFl  +  a12F2  +  **•  +  almFm  +  al°l 


Z,  =  a.,F,  +  a.  F  +  •••  +  a.  F  *  a.U. 
j  ]1  1  ]2  2  jm  m  3  3 


Z  =  a  F,  +  a  F„  +  •••  +  a  F  +  aU 

n  nil  n22  nmm  nn 


The  factors  are,  of  course,  hypothetical  and  their  description  is  usually 
given  by  a  pattern  matrix  A  =  (a.  )  of  common  factor  coefficients,  and 
a  structure  matrix  S  =  (s^),  set  correlations  between  each 

variable  and  factor. 

The  invariant  part  of  a  factor  analysis  solution  is  the  subspace  of 

common  factors,  common- factor  space,  defined  by  the  set  of  standardized 

column  vectors  F,  ,...,  F  .  The  n-space  of  standardized  variables 
1  m 

Z ^  lies  hopefully  close  to  the  m  space  of  common  factors  and  each 
variable  is  projected  onto  common- factor  space  by  its  unique  factor, 
a^.  Selecting  a  particular  solution  for  the  factor  analysis  model 
corresponds  to  selecting  a  set  of  basis  vectors  {Fj  , ....  F  )  to  describe 
the  invariant  common- factor  space. 

The  projection  of  variable  Z^  on  common-factor  space  is  ,  the 
prediction  of  Z^  from  the  common  factors  alone. 


Z,  *  a,  F.  t  a,  F  t  ■"  +  a.  F  , 
3  ]11  32  2  jmm* 


so  that 


Z *  —  Z  .  +  a.U.. 
j  3  3  3 
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(4) 


Thus  the  variance  of  Z^  is 


(Z.IZ.)  =  1  =  (Z.|Z.)  +  2a.(Z.|U.)  +  a?(U.|U.) 
31  3  3*3  3  31  3  3  31  3 


The  variance  of  Z ^  is  called  its  communality  Ik2.  Because  the  unique 
factors  are  orthogonal  to  all  common  factors,  (Z^lu^)  =  0  and  since  the 
variance  of  the  unique  factors  is  one.  Equation  4  becomes 


var  (Z.)=l=h?+a^ 
3  3  3 


The  communality  h?  is  the  variance  explained  by  the  common  factors. 
Both  Ik  and  a^  are  fixed  for  any  factor  analysis  solution,  hence 
for  the  entire  set  of  particular  solutions  generated  by  rotating  the 
common  factors  to  different  bases  for  common- factor  space. 

B.  Problems  with  Oblique  Factors 

In  order  to  understand  intuitively  the  dimensions  of  common- factor 
space  or  to  identify  factors  it  seems  likely  that  an  oblique  set  of 
factors  is  preferable.  In  addition,  a  factor  which  has  been  placed 
close  to  a  group  of  real,  observed  variables  would  seem  more  likely 
observable  itself. 

However,  there  are  serious  problems  involved  in  the  interpretation 

of  the  output  of  oblique  rotations  which  have  discouraged  many  workers 

from  leaving  orthogonality.  The  pattern  and  structure  matrices  are  not 

identical  and  they  are  both  tricky.  For  some  examples  let  us  consider 

two-factor  space:  Z.  *  a.  F  +  a.F  .  A  variable  may  be  uncorrelated 
3  31  1  32  2 

with  a  factor  F  j  and  yet  have  a  high  loading  a^  on  it  or  it  might 

have  a  large  positive  structure  value  and  yet  a  negative  loading 

a.,  .  (See  Figure  6.) 

3  l 


The  basic  difficulty  in  interpreting  the  structure  matrix,  and  part 
of  the  reason  for  these  seeming  discrepancies  between  structure  and 
pattern,  is  that  the  variable-factor  correlations  are  affected  in  the 
oblique  case  by  the  correlations  among  factors.  This  will  be  explained  in 
more  detail  later.  Moreover,  two  variables  may  be  correlated  with  one 
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Figure  6.  Relation  between  Factors  and  Variables 

factor  as  high  as  .707  and  yet  bo  totally  uncorrelatod  themselves,  making 
it  difficult  to  pick  out  groups  of  variables  and  fit  them  to  factors  by 
examining  structure  alone. 

The  problem  with  oblique  factor  patterns  is  that  the  sum  of  the 
loadings  squared  for  one  variable 

m 

P=l 

must  no  longer  equal  the  communality  or  even  be  less  than  one,  as  in  the 
orthogonal  case.  Although  rotation  tends  to  purge  middle-sized  loadings, 
it  may  result  in  loadings  greater  than  one,  or  in  several  large  loadings 
which  indicate  not  so  much  linear  determination  as  they  do  that  the  factors 
are  highly  uncorrelated  (or  correlated)  as  in  Figure  6. 

We  may  state  the  problem:  given  the  linear  representation  of  2^ 
as  in  Equation  2 


Z. 

3 
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how  "important"  is  each  factor  in  the  determination  of  all  the  values  of 
the  vector  Z^  . 

The  contribution  to  the  variance  of  is  the  measure  usually  used. 

The  variance  of  Z^  is 


_  _  in  m  in  in 

(Z.lz.)  =  h.  =  T  y  a.  a.  (F  |F  )  =  y  J  a.  a.  rp  . 

vi  j  L,  L,  jp  jq  p1  q  t  ip  nq  F  F 
J  J  J  p-1  q=l  v  ^  p=l  q=l  J  p  q 


(6) 


When  the  factors  are  orthogonal ,  it  is 


r 


F  F 

p  q 


( F  J  F  ) 

p  q 


f  o,  pjfq 

p=q 


so  that 


+  a? 
32 


+  a? 


3m 


(7) 


Thus  the  contribution  of  each  orthogonal  factor  to  the  variance  of 
is  the  square  of  its  loading.  This  clean  resolution  of  variance  explains 
why  orthogonal  factors  may  be  easily  interpreted;  their  relative  impor¬ 
tance  in  determining  all  the  values  for  a  variable  can  be  separately 
evaluated. 

However,  for  oblique  factors  the  terms  containing  p  >  p^q, 
do  not  drop  out  and  we  have  p  ^ 


r?  =  a?,  +  a?  +  •••  +  a?  +  2 (a.  a.  r_  _  +  •••  +  a.  a.  r_  „  I 

3  31  32  3m  I  31  32  ^F  ]1  3m  F  F  I 

+  2 It.  a.  r_  _ 
l  32  33  F  F 


j2  jm  F  F 
J  J  2  m 


+  •••  +  a.  a.  r„  „  +  •••  +  2  a.  a.  r 


(8) 


j,m-i  jm  F 


p) 
1  m  / 


2 

The  terms  a^  are  called  direct  contributions.  The  mixed  terms  may 
be  named  two-factor  interactions  and  they  cause  the  problems.  These 
interactions  are  not  variance  "contributions"  because,  for  one  thing, 
they  may  be  negative.  They  may  be  looked  upon  best  as  corrections 
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due  to  the  tendancy  of  one 


applied  to  the  direct  contributions 
factor  to  vary  with  another* 

One  method  of  separating  the  total  variance  into  components  for  each 
factor  is  to  simply  divide  the  interaction  into  halves  and  assign  a  half 
to  each  factor.  This  results  ins 


IU  111 

hi . =  *  -  *  JjVVy-p 

m  ra 

h*  =  a.  7  a .  r_  _  +  •••  +  a,  F  a,  r„  . 
j  ii  **,  3p  F  F  nm  *•,  ip  F  F 

J  J  p=l  Jf  i  p  J  p=l  m  p 


a.  s.  +  •••  +  a.  s.  . 
31  jl  jm  jm 


The  terms  of  Equation  9  might  be  said  to  approximate  the  contribution 
of  each  factor  to  the  variance  of  Zy  In  the  orthogonal  case  it  reduces 
to  Equation  7  and  it  often  gives  an  enticingly  clear  picture.  But  it  also 
results  in  negative  values  whenever  a^  and  s ^  are  of  opposite  sign. 

This  is  one  indication  that  the  interactions  are  simply  that— interactions— 
and  cannot  be  resolved  into  shares. 

Any  procedure  such  as  halving  the  interaction  which  gave  us  a  matrix 
of  contributions  to  variance  could  generate  an  orthogonal  pattern  simply 
by  taking  the  square  root  of  each  element.  Therefore,  because  each  set  of 
factors  has  only  one  pattern,  A,  no  oblique  factors  may  be  so  resolved. 

When  the  factors  are  correlated,  the  analysis  of  variance  model  (separating 
the  sums  of  squares)  can  no  longer  be  used  but  a  new  model  must  be  formulated. 
(Reference  61,  p.  464,  p.  634). 


Factor  Analysis  and  Regression  Analysis 


To  facilitate  the  development  of  a  statement  concerning  the  contri¬ 
bution  of  oblique  factors  to  variance,  let  us  show  that  the  factor  pattern 
equations  are  a  set  of  classical  regression  equations  of  the  variables  on 
m  common  factors. 


The  factor  analysis  description  of  a  variable  Z, 
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=  a.,F  +  a.F  + 
31  1  32  2 


+  a.  F  +  a.U. 
3m  m  ]  j 


(10) 


is  simply  a  linear  equation  of  the  variable  Z ^  in  terms  of  m+1  others 
It  makes  no  difference  mathematically  that  the  one  is  observed  and  the 
others  hypothetical.  The  value  of  Z.  predicted  by  the  common  factors 
is  2.  . 


Vi 


a.  F 
jm  m 


(11) 


Harman  (Reference  2,  p.  18)  proves  that  Equation  10  is  a  regression 
equation  but  we  may  just  as  easily  prove  the  more  pertinent  theorem 
that  Equation  11  is  a  regression  equation. 

The  sum  of  squares  of  residuals  for  Equation  11  (over  the  N 
values  of  the  vector)  is 

X  (z«  -  v2  •  i<zi  -  v  1  <zj  -  v>  ■  *5 

Since  a?  =  1  -  h? ,  3  =  1,  • • •  ,n,  are  unique  for  any  factor  analysis 
solution  they  may  be  regarded  as  at  a  minimum  for  a  set  of  factors  (this 
assumption  also  defines  the  factors  as  least  squares  estimates).  Hence 
Equation  11  may  be  regarded  as  a  least  squares  solution  and  a  regression 
equation  with  a  standard  error  of  estimate  a ^ . 

A  regression  equation  is  usually  represented: 


Y  = 


b1xl 


+  b2x2  + 


+  b  x  +  e 
m  m 


or  simply  in  vector  notation 


Y  =  XB  (12) 

where  Y  is  a  least  squares  estimate  to  Y;  Y,  Y,  X  are  score  vectors: 

B  is  a  coefficient  vector.  We  may  easily  imagine  Y  and  B  extended 
to  matrices  of  column  vectors.  The  factor  analysis  model  uses  row  vectors 
for  scores  and  linear  coefficients.  Assuming  that  all  variables  are 
standardized  we  may  let 


129 


z'  =  y 
f»  =  x 
A*  =  B 


and 

Z  =  AF 

Z«  =  F'A<*  is  equivalent  to  Y  =  XB  . 

Thus  the  factor  pattern  is  a  set  of  regression  coefficients. 

The  regression  analysis  solution  for  B  is 

X'Y  =  X'XB 

B  =  (X'X)  ~IX'Y  (13) 

The  C  matrix  is  usually  defined 

C  =  (X’X)  -I 

and  then 
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(Section  2.7).  Thus,  letting  q  stand  for  all  Xp,  p  =  1,  ....  n<  and  using 
standard  notation  for  multiple  correlation: 


(Y  |Y  ) 

^Y .  .X  ...X  =  “v..  =  - 1~ L 


3  1_  m  3  V^Y.lY^CY^Y.) 


But 


Yj  =  Y .  +  a.U.  ,  (Yj|0j)  =  0  ,  and  (Yj  Jy^.)  =  1 


so  that 


(Y  |Y  ) 

=  =  /(Y  |Y  )  =  h 

l'q  ’/<Yj|Yj>  3 


(15) 


Hence  the  squared  multiple  correlation  of  a  variable  on  the  m  common 
factors  is  its  communality,  or  its  explained  variance. 


D,  A  Proposed  Measure  for  the  Importance  of  Oblique  Factors 

Methods  usually  associated  with  regression  analysis  enable  us  to 
make  a  statement  regarding  the  contribution  of  correlated  factors  to  the 
variance  of  their  dependent  variable.  It  concerns  not  the  direct 
contribution  which  we  have  shown  to  be  meaningless  in  the  oblique  case, 
but  the  amount  of  explained  variance  which  a  factor  adds  after  all  others 
have  been  taken  into  account.  Although  this  measure  is  probably  as  much 
as  can  be  said  about  the  separate  effect  of  a  factor,  it  is  a  natural  and 
useful  statement. 

As  stated  before,  the  variance  due  to  oblique  factors  cannot  be 
simply  divided  among  them,  due  to  the  two-factor  interactions — the  tendency 
of  factors  to  vary  together.  Hence  we  might  search  for  a  way  to  examine 
the  relationship  between  a  factor  and  a  variable  with  the  other  factors 
held  constant.  We  tend  to  assume  that  if  a  variable  and  a  factor  are 
correlated  that  the  factor  is  (mathematically)  affecting  the  variable; 
but  with  correlated  factors  the  observed  correlation  between  variable 
and  factor  may  be  spurious — the  results  of  limitations  placed  on  their 
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correlation  by  being  both  tied  to  a  second  factor.  Ezekiel  (Reference  23, 
p.  195)  states  the  other  possibility:  "It  is  evident  that  a  mere  surface 
examination  of  a  set  of  data  cannot  reveal  which  independent  factors  are 
important  and  which  are  unimportant.  A  factor  which  shows  no  correlation 
with  the  dependent  variable  may  yet  show  significant  correlation  after 
the  relation  to  other  variables  has  been  allowed  for." 

Consider  three  correlated  variables.  If  the  correlation  of  two 
variables  were  measured  for  groups  of  fixed  values  of  the  third  variable  and 
a  weighted1  average  formed,  the  correlation  would  probably  be  different.  We 
called  such  a  measure  partial  correlations  (Section  2»7)  and  it  may  be 
written  in  terms  of  simple  correlations. 


r 


12.3 


r12-rl 3r23 


(16) 


This  value  is  the  correlation  of  one  and  two  with  the  disturbing  effect 
of  three  removed.  (A  proof  of  Equation  16  may  be  found  in  Reference  62, 
p.  479)  The  relationship  between  partial  correlation  and  multiple  correlation 
(explained  variance)  is  given  as  follows  (Reference  63,  p.  344)  for  the 
regression  of  Y  on  4  factors 


1_RY.l234  =  ^1-rYl^1-rY2.1^1-rY3,12^i""rY4.123^ 

This  expression  may  be  extended  to  the  regression  of  Y  on  Xj  to 
by  multiplying  the  right  hand  side  of  Equation  17  by  appropriate  terms  in 
the  series.  Thus  if  we  let  q  represent  all  factors  but  those  to  the  left 
of  the  dot  and  in  parentheses  beside  q 


The  order  of  X's  does  not  matter;  X^ •  may  be  any  of  the  factors 
Xj,  •••  ,  X^.  Solving  for  the  partial  correlation  squared 
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For  four  factors  the  partial  correlation  squared  between  Y  and  becomes 


Y2  .134 


(1-^,134)--(1-^.1234) 
(1  '  R2Y.134> 


.  .  2  2 
Simplifying  the  numerator  of  Equation  19  yields  Ry » l  2 3 ”  ^'134  * 

Recalling  that  multiple  correlation  squared  equals  communality  and  gen¬ 
eralizing  to  m  factors ,  the  numerator  of  Equation  19  is  seen  to  be 
the  difference  between  the  explained  variance  of  Z_.  as  a  regression 
on  all  m  common  factors  and  its  variance  as  a  regression  on  all  the 
common  factors  but  one.  Let  this  difference  for  the  omission  of  factor 
Fp  (Xp  is  equivalent  regression  language)  be  denoted  by  v^  . 

This  is  the  proposed  measure.  It  has  shown  up  while  examining  the  rela¬ 
tionship  between  a  variable  and  a  factor  with  the  disturbing  effect  of 
other  factors,  due  to  co-variance,  removed,  as  it  is  implicitly  with 
independent  factors . 

We  may  define  the  unique  contribution  to  variance  v^  of  factor 

Fp  for  variable  Z^  as  the  additional  variance  explained  by  factor  p 

after  all  the  variance  of  Z.  explainable  by  the  other  factors  F.  ,k  /  p 

3  K 

has  been  taken  into  account.  More  formally 


4  ■  -4.F, . . r. 


R2  R2 

V.q  -  V  -q(Fp) 


and  we  state  a  theorem  which  is  proved  at  the  end  of  this  section.  When 

v2.  is  the  unique  contribution  to  variance  as  defined  above 
3P 

b? 

v?  =  -2E.  (21) 

3P  cp? 


or  in  factor  analysis  notation 
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(22) 


Due  to  the  interactions  this  expression  is  about  as  much  as  can  be  said 
about  the  separate  "importance"  of  the  oblique  factors  to  the  explained 
variance.  It  represents  the  part  of  the  total  variance  which  must  be 
explained  by  that  factor  or  be  lost — a  natural  and  meaningful  measure. 

Furthermore,  the  coefficient  vj^  is  a  generalized  measure  for  all  sets 
of  factors,  orthogonal  ones  being  a  special  case  which  happens  to  sum  to  the 
total  explained  variance.  This  phenomenon  exists  because  the  interactions 
are  zero  and  thus  a  factor's  contribution  cannot  be  particularly  picked  up  by 
another  factor.  Notice  that  then  $  1  -  I  and  Equation  22  reduces  to 


For  oblique  factors 


< 


which  indicates  again  that  part  of  the  explained  variance  is  not  "unique" 
to  any  one  factor. 

Using  Equations  15,  19,  and  20  the  partial  correlation  between  and 

F  becomes 
P 


Y.F  .q 
1  P  H 


It  is  a  measure  of  the  correlation  with  other  factors  held  constant,  and 
it  might  be  used  in  a  "corrected"  structure  matrix  to  help  name  the  factors. 
The  square  of  this  term  is  seen  to  be  the  unique  contribution  to  variance 
divided  by  the  variance  of  Zj  with  F^  removed,  or  simply  how  much 
(a  ratio)  of  the  otherwise  unexplained  variance  it  explains.  It  is  used 
by  several  authors  as  a  measure  of  the  importance  of  factor  F  .  However, 


3P 


seems  to  be  a  stronger  measure  because  it  is  desirable  In  factor 
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analysis  to  keep  our  "importance  coefficient"  in  terms  of  variance  and 
absolute  for  inter-variable  comparisons.  Thus  for  evaluating  the  impor¬ 
tance  of  a  factor  v.2  is  preferable. 

DP 

It  is  important  to  remember  that  the  contribution  to  variance  of  an 
oblique  factor  is  not  a  unique  value  but  a  range  of  possible  values.  We 
may  think  of  v.  as  a  sort  of  lower  bound  to  this  range.  Perhaps  it 
would  be  worthwhile  to  also  set  an  upper  bound,  or  to  examine  the  conse¬ 
quences  if  two  or  three  factors  were  removed  at  a  time.  Further  investi- 

2 

gation  of  the  problem  is  needed.  This  section  offers  v.  as  an  easily 
computable  measure  of  the  unique  contribution  to  variance.  Perhaps  it 
and  other  measures  to  come  can  put  oblique  rotations  on  the  road  to 
engineering  practice. 

Proof  of  Equation  22:  The  elements  of  any  j  row  (or  column) 
of  (X'X)-3'  =  ^  divided  by  the  negative  of  diagonal  element  c.. 

give  the  regression  equation  of  X^  in  terms  of  the  other  X's. 


x  -  _1l  x  _. . ._  x  — iiiti  x.  -•  •  • — is  x 

Cj  D  1  Cj  j  3-1  CDD  3+1  CDD  m 


Let  us  prove  Equation  23  in  more  useful  terms,  using  the  column 
vector  Y  instead  of  X^  and  letting  X  be  a  set  of  column  vectors 
X  ,  from  the  partitioned  matrix  [Y|x] 

Then  let 


f  =  [y|x]'[y|x]  =[||-][Y|X]  = 


(F  is  simply  the  matrix  of  correlations  for  the  columns  of  [y|x]  as  $ 
is  for  X), 

Let  the  partitioned  inverse  of  F  be 


=  ([y|x]'Cy|x])" 
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(24a) 


Y'Y  Y’X 
X '  Y I X '  X 


and  one  of  the  four  resulting  equations  is 
X'Ye  +  X'XD'  =  0 


therefore 

-  —  D*  =  (X’X)"1  X'Y  =  B  . 

c 


(24b) 


(24b) 


But  the  expression  in  the  middle  of  Equation  24b  is  the- least  squares 
solution  for  the  regression  coefficients  B  in  Equation  13  and  D  is  a 
column  of  E  Hence  Equation  23  is  true.  We  may  also  shew  from 
Equations  24a  and  24b  that 


Y' Ye  +  Y'XD'  =  I  . 
Y'Ye  -  Y'XBe  =  I  . 


Hence 


e  =  (Y'Y  -  Y'XB) 


For  standardized  variables  Y'Y  is  one;  and  Y'  is  a  row  vector  of 
values  while  XB  is  a  column  vector  of  least  squares  estimates  of  those 
values . 


Thus 


and 


Y'XB  =  Y'Y  =  (Y|Y)  =  R| 


(25) 
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This  is  the  well  known  fact  that  the  diagonal  elements  of  R"*^ 
contain  the  multiple  correlation  of  one  variable  on  all  the  others. 

Now  we  may  examine  the  effect  on  a  least  squares  regression  equa¬ 
tion  of  eliminating  a  factor  by  studying  the  effect  of  eliminating  a 
row  and  column  (before  inverting)  on  the  inverse  of  a  correlation 
matrix. 

Specifically,  let  G-^  =  (g„)  be  the  m  x  m  matrix  of  the 

inverse  of  G  where  G  is  formed  from  F  =  (f..)  as  before  by  elim- 

13 

inating  factor  Xu  , 

Brownlee  (Reference  62,  p.489)  gives  us  the  formula 


gij  =  fij  f 


fiufju 


(26) 


In  particular  for  the  diagonal  elements  of  G-^  corresponding  to 
vector  Y 


fZ 

g  =  f  - 

Byy  yy  f 

uu 


(27) 


The  matrix  C  as  defined  previously  may  be  thought  of  as  the  inverse 
of  the  matrix  of  F  after  removing  the  variable  Y  . 

Thus  Equation  26  becomes 


f.  f. 

c.  =  -  .  -OL 


13  13 


yy 


Letting  i=j=u  and  solving  for  f  in  the  above  equation 


fby 


f  =  c  +  - 

UU  UU  f 


(28) 


yy 


Equation  27  may  be  written 
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which  is  Equation  22-. 

For  m=3  and  p=l  Equation  22  may  be  written 


and  this  equation  may  be  found  in  Reference  63,  p,  339, 


Section  VI 

UNIQUENESS  OF  FACTOR  ANALYSIS 


6.1  INTRODUCTION 

Section  VI  deals  with  the  problem  of  uniqueness  in  factor  analysis.  The 
concept  of  uniqueness  is  described  in  subsection  6.2.  Uniqueness,  on 
the  other  hand,  is  closely  related  to  the  much  more  practical  problem  of 
how  large  a  sample  one  has  to  have  for  doing  a  factor  analysis.  So  6.3 
shows  this  relationship  and  then  establishes  a  means  how  to  solve  the 
two  problems,  which  are  actually  the  one  problem  of  uniqueness  in  factor 
analysis. 

6.2  THE  ISSUE  OF  UNIQUENESS 

The  issue  of  uniqueness  can  be  described  as  follows.  Two  independent 
teams  are  told  to  collect  data  and  perform  a  factor  analysis  of  a  certain 
subject  matter  area.  The  issues  are  described  in  the  same  way  to  each 
team.  Data  collection  and  analysis  is  performed  independently  by  each 
team,  independent  decisions  are  made  about  factoring,  and  separate  final 
reports  are  drawn  up.  The  issue  of  uniqueness  is  this:  will  the  reports 
be  "basically"  the  same? 

Of  course,  the  issue  has  been  transformed  into  one  centering  on  the 
meaning  of  "basically."  If  the  picture  is  redrawn  slightly  the  issues 
will  be  clearer.  Suppose,  to  make  it  more  specific,  that  the  study  i3 
the  psychological  one  mentioned  above,  focused  on  one  large  school,  and 
using  examination  results  of  students  to  uncover  mental  factors.  As  we 
now  impose  more  conditions  on  the  picture ,  the  reports  of  the  teams  ought 
to  grow  more  and  more  similar.  First  we  require  that  neither  team  make 
longitudinal  studies,  then  we  require  that  there  should  be  no  separate 
analysis  for  males  or  females,  nor  for  school  grades.  Next  we  require 
that  neither  team  invent  and  administer  to  its  own  test  on,  say  manual 
dexterity  or  reading  speed.  Finally,  we  require  that  each  team  use  the 
same  squared  multiple  correlation  for  communalities ,  and  varimax  rotation. 

It  should  be  clear  that  continuing  to  standardize  the  teams  will  lead 
us  to  the  point  that  any  discrepancies  between  the  final  reports  must  be 
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due  to  sampling — one  team  selected  one  group  of  students ,  and  the  other 
team  selected  another  group.  There  may  have  been  an  overlap,  but  still 
the  final  reports  are  different. 

Let  us  now  make  one  further  change  in  the  foregoing  picture. 

Suppose  that  one  team  has  studied  boys  only,  and  the  second  has  studied 
girls  only,  and  we  wish  to  know  whether  the  differences  in  their  final 
reports  are  due  to  sampling  differences ,  or  to  sex  differences .  Here  we 
are  at  the  crux  of  the  issue  of  uniqueness .  If  the  two  teams  were 
measuring  some  simple  statistic,  like  the  classroom  grade  or  height,  the 
issue  could  be  simply  resolved  by  the  appropriate  F-test  or  t-test,  but 
in  factor  analysis  we  are  dealing  with  a  highly  complex  set  of  inter¬ 
related  statistics. 

6.3  SAMPLING  CONSIDERATIONS 

Although  it  may  not  appear  so  at  first  sight,  the  issue  of  unique¬ 
ness  is  also  very  closely  related  to  the  much  more  practical  problem  of 
how  large  a  sample  one  ought  to  work  with  in  a  factor  analysis.  Of 
course  if  observations  are  cheap,  there  is  no  problem,  and  the  issue  is 
resolved  by  considering  the  clerical  facilities  available  for  copying 
or  punching  numbers..  Bad  data  can  be  freely  edited  out,  and  there  is 
only  one  question  facing  the  investigator:  is  the  data  really  repre¬ 
sentative  of  the  population  of  response  about  which  I  wish  to  make 
inferences?  More  specifically,  the  issue  can  be  rephrased  as  follows: 
when  an  outlier  is  thrown  out  because  it  is  unrepresentative ,  can  I  be 
sure  that  I  am  drawing  inferences  about  a  population  which  is  also 
free  of  "unrepresentative"  observations?  If  the  answer  is  no,  then  the 
investigator  should  not  throw  out  such  data. 

Usually  observations  are  expensive  to  collect,  and  one  cannot 
simply  choose  1000  observations  because  it  is  a  round  number.  The 
investigator  must  select  one  sample  of  200,  say,  and  remind  himself  that 
another  investigator  in  doing  a  similar  factor  analysis  might  have 
selected  a  different  sample  of  200  to  work  with.  The  conclusions  of  one 
investigator  should  not  contradict  those  of  the  other,  no  matter  whether 
the  second  investigator  is  real  or  imaginary.  So  here  again  we  are  at 
the  same  issue  as  that  posed  previously  as  the  issue  of  uniqueness. 
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This  time  phrased  non-technically ,  we  call  it  the  issue  of  sampling 
variability.  Much  of  the  purely  mathematical  statistical  issue  has  been 
resolved,  but  the  conclusions  have  not  yet  been  formulated  in  terms  of 
rules  of  thumb  which  the  non-professional  can  use. 

To  develop  such  rules,  a  simulation  program  has  been  written. 
Basically,  it  set  several  imaginary  investigators  to  work  on  the  same 
data,  as  described  above.  Differences  between  the  results  of  these 
investigators  are  then  examined,  and  in  this  way  we  can  discover  how 
tentatively  one  investigator  must  describe  his  results  in  order  not  to 
contradict  or  be  contradicted  by  an  imaginary  colleague. 

Since  we  shall  assume  that  the  data  available  to  investigators  are 
normally  distributed,  the  starting  point  of  any  such  simulation  will  be 
a  need  to  generate  multivariate  normal  deviates  in  the  computer.  This 
issue  does  not  seem  to  have  been  dealt  with  directly  in  the  literature, 
but  can  be  solved  in  the  following  manner.  Let  us  assume  that  random 
normal  deviates  are  available  as  needed.  These  can  be  generated  directly 
through  any  of  the  methods  now  available,  or  generated  indirectly  through 
a  random  (rectangular)  number  generator  plus  a  "normit"  routine  which 
provides  a  normal  deviate  corresponding  to  any  desired  probability  level. 
The  probability  level  of  course  will  be  obtained  from  the  random  number 
generator.  With  such  random  deviates  x_  freely  available,  drawn  from 
a  standardized  population  with  mean  zero  and  variance  unity ,  we  desire 
to  generate  a  multivariate  normal  vector  variable  £  which  shall  be 
standardized  to  zero  mean  and  unit  variance ,  but  shall  have  any  prescribed 
covariance,  i.e.,  correlation,  structure  R.  The  positive  definite 
correlation  matrix  R,  of  size  n  x  n,  will  then  describe  the  population 
which  we  are  factor  analysing.  If  R  is  the  unit  matrix,  then  £  =  x 
will  serve  as  the  generated  variable,  but  in  general  it  will  be  necessary 
to  discover  the  non-symmetric  matrix  A,  of  size  n  »  n,  which  has  the 
property  that 

=  Ait  has  correlation  matrix  R. 

The  covariance  of  the  vector  variable  £  is  given  by 
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E(^'y)  =  E(x'  (A'A)x). 

Using  the  fact  that  x  is  standardized,  we  can  expand  the  condition 
that  jr.  have  matrix  R  =  (r^ . )  by  expressing  it  in  terms  of  conditions 
on  A  =  (a^). 

There  are  enough  degrees  of  freedom  that  we  may  immediately  impose 
the  condition  that  £  be  standardized.  In  this  case  the  covariance  and 
correlation  of  are  identical  and  the  simple  condition  that  A  must 
satisfy  that 

A' A  =  R. 

Expanding  this,  it  is  a  system  of  equations 

■  -i) 

with  diagonal  elements,  specifically. 


and  r. .  =  r.. 

following  trivally  from  the  above  expansion. 

A  simple  example  will  illustrate  the  situation  here.  If  n  =  2, 

we  have  two  independent  standard  normal  deviates  and  x^ ,  and  wish 
to  manufacture  two  other  variates  y  and  y2  with  the  property  that 
they  have  a  desired  correlation  r  with  each  other.  It  can  easily  be 
verified  that  if  yt  and  y2  are  defined  as 


y  2  -  rx!  +  .  x£ 

they  will  have  the  desired  property.  The  matrix  A  thus  defined  is 
easily  constructed  for  n  =  2.  For  n  =  '3  it  will  be  seen  that  the 


'  I 
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algebra  becomes  complicated.  There  will  be  three  specified  correlations 
r12  »  rl3  ant*  r2  3  *  Building  up  the  desired  variates  as  before,  we  have 


where  a  is  the  coefficient  of  x2  in  y3  .  Although  the  algebra  rapidly 
becomes  impossibly  complex,  the  process  is  very  straightforward  and  can 
easily  be  built  into  an  algorithm  for  use  in  a  computer.  Specifically,  if 
i  exceeds  j  ,  a  will  be  developed  in  such  a  way  as  to  produce  the 
desired  correlation  r^  ,  and  if  i  equals  j  ,  the  coefficient  will 
be  developed  so  as  to  ensure  unit  variance  of  the  corresponding  y  . 

The  above  transformation,  or  one  similar  to  it,  has  been  used  in 
other  connections  by  various  authors,  but  ordinarily  for  the  opposite 
purpose,  namely  to  provide  uncorrelated  variates  from  correlated  ones. 

It  is  important  to  note  that  the  transformation  will  produce 
conservative  results.  That  is,  the  correlation  matrix  will  be  treated 
as  if  it  were  a  population  matrix,  even  though  it  is  only  a  sample  matrix. 
Thus  the  later  sample  correlation  matrices  which  are  developed  by  the 
algorithm  will  be  more  like  the  original  sample  matrix  than  they  really 
"ought"  to  be.  The  only  alternative  would  involve  building  a  model  based 
on  "fiducial"  distributions  of  population  parameters ,  and  strong  exception 
would  be  taken  to  this  procedure  by  many  investigators.  The  results 
coming  from  the  program  are  striking  enough  that  the  conservatism  is  not 
objectionable. 

A  computer  program,  within  the  UNIVAC  1105,  has  been  developed 
incorporating  the  above  algorithm.  Basically  it  contains  the  following 
prodedure . 
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1.  Reads  the  input  correlation  matrix  R  ,  the  sample  size  N  to 
be  used,  the  matrix  size  r  ,  and  the  number  of  iterations  to  be  made  of 
the  program  (see  below). 

2.  Generates  N  vector  variables  which  are  pseudo-random 
samples  from  a  normal  population  with  correlation  matrix  R  . 

3.  Forms  the  correlation  matrix  of  these  variables. 

4.  Calculates  the  squared  multiple  correlation  estimates  of  the 
r  communalities ,  perform  the  principal-factor  solution  to  the  factor 
analysis. 

5.  Prints  out  the  sample  correlation  matrix  and  associated 
commonality  estimates,  and  the  characteristic  roots  and  scaled  vectors 
of  the  solution.  Saves  the  answers  in  computer  binary  format  to  be 
used  below  and  in  the  varimax  rotation  program. 

6.  Repeats  steps  2  to  5  above  the  number  of  times  requested 
in  iteration  parameter  in  step  1  above. 

7.  Calculates  the  averages  and  variances  of  all  the  eigenvalues 
and  eigenvectors  and  prints  these  out. 

8.  Returns  to  step  1  above  unless  directed  to  terminate  the 
program. 

The  purpose  of  the  program  was  two-fold.  First,  to  see  how  the 
stability  of  estimates  increases  as  sample  size  N  increases,  and 
secondly  to  see  how  this  same  stability  is  influenced  by  the  correlation 
structure.  Most  of  the  runs  were  performed  with  independent  population 
data,  so  that  roots  and  vectors  were  calculated  from  data  with  the 
identity  correlation  matrix.  The  number  of  iterations  under  various 
parameter  combinations  is  given  in  the  following  table.  A  non-orthogonal 
design  was  used  because  of  the  machine  time  and  costs  associated  with 
large  variate  sizes ,  and  was  close  to  optimum  when  these  costs  are 
considered  a3  part  of  the  design. 
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Table  7 


Number  of  Computer  Runs  of  Factor  Analysis 
on  Independent  Data 

Sample  Size 


No.  of 


variates 

100 

200 

300 

Total 

30 

3 

3 

20 

11 

11 

10 

6 

6 

6 

18 

Total  1 

17 

9 

6 

32 

In  addition,  11  iterations  were  made  on  the  classical  24  psychological 
variate  test  data  from  the  Spearman-Holzinger  Unitary  Trait  Study,  used  by 
Harman  (Reference  2)  and  others.  The  same  N  of  145  was  used  as  in  the 
original  study,  and  of  course  the  number  of  variates,  n  ,  was  taken  as  24. 

Some  general  conclusions  of  practical  relevance  are  as  follows. 

t 

1.  When  we  are  sampling  from  independent  data ,  the  use  of  squared 
multiple  correlation  (SHC)  communalities  tends  to  create  "errors  of  the 
first  kind."  That  is,  it  leads  to  production  of  one  or  even  two  roots 
which  are  relatively  larger  than  all  the  others.  For  instance,  in  one  of 
the  three  iterations  for  N  =  300,  n  s  30,  the  two  largest  roots 
were  .95  and  .91,  followed  by  much  smaller  roots  .37,  .25,  .23,  etc. 

To  describe  the  situation  intuitively,  what  happens  is  that  if  the 
sampling  from  independent  data  appears  by  accident  to  produce  something 
that  looks  significant,  the  SHC  procedures  jump  on  it  and  try  to  make  it 
look  good. 

2.  When  SHC  communalities  are  used,  common  practice  is  to  assume 
that  there  will  be  one  insignificant  positive  root  for  every  negative 
root.  From  the  previous  paragraph,  it  can  be  further  suggested  that  one 
and  possibly  two  further  small  positive  roots  can  be  assumed  insignificant 
because  of  the  SHC  bias  mentioned  above.  Of  course,  unless  the  roots 

are  much  in  excess  of  unity  there  can  be  no  significance  imputed  to  them 
in  any  case.  Insignificant  roots  tend  to  be  largest  when  N  is  small 
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and  n  is  large ,  as  one  might  expect .  When  N  =  100 ,  n  =  20 ,  for 

instance ,  half  of  the  largest  roots  exceeded  unity ,  but  none  was  larger 
than  1.18. 

3.  The  sampling  variability  of  the  largest  root  in  independent  data 
is  surprisingly  independent  of  both  N  and  n.  The  variance  is  approxi¬ 
mately  .01.  It  is  much  larger  if  the  root  is  significant,  but  the 
coefficient  of  variation,  i.e.,  the  standard  deviation  relative  to  the  mean 
remains  relatively  stable  at  about  10  per  cent.  This  can  serve  as  the  rule 
of  thumb  for  largest  eigenvalues. 

4.  The  sampling  variability  of  the  eigenvectors  corresponding  to  the 
largest  root  in  independent  data  depends  on  both  N  and  n.  The  variance 
of  the  eigenvectors  decreases  approximately  as  the  inverse  of  the  square 
root  of  N.  Thus  it  is  relatively  insensitive  to  changes  in  N.  To 
illustrate,  if  N  =  100,  n  =  10,  the  variance  is  .048  and  the  standard 
deviation  of  course  is  .22.  If  we  take  four  times  as  many  observations, 
the  variance  is  reduced  by  one-half,  and  the  corresponding  standard  deviation 
is  .155,  not  a  great  improvement  on  .22  considering  the  quadrupling  of  data 
involved. 

This  sampling  variance  also  diminishes  as  n  increases.  However,  the 
relationship  here  is  much  more  complex,  and  the  experimental  design  used  in 
collecting  the  data  did  not  permit  high  clarification  on  this  point.  As  a 
tentative  approximation,  it  appears  that  the  sampling  variance  diminishes 
as  the  inverse  of  n. 

The  foregoing  conclusions  seem  to  be  at  variance  with  those  of  Harman 
(Reference  2,  Appendix,  Table  B),  but  comparison  is  not  possible  since  his 
results  do  not  apply  to  independent  data.  From  practical  experience,  it 
seems  desirable  to  make  the  pessimistic  assumption  that  the  numerical 
information  one  has  collected  does  not  look  encouraging  and  that  the 
investigator  would  be  happy  to  find  any  significant  pattern  at  all  in  it. 

The  rule  of  thumb  suggested  from  the  foregoing  is  that  the  variance  of 
the  eigenvectors  corresponding  to  largest  roots  in  independent  data  is 
5/n/~N  . 

5.  When  the  data  contain  significant  material,  the  sample  eigenvectors 
do  have  a  population  value  to  gravitate  towards,  and  hence  the  sampling 
variability  of  the  coefficients  diminishes.  In  tiv  d:..' t.-ry  i'uait  data,  the 
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variance  of  the  eigenvectors  corresponding  to  the  largest  root  was  .004. 

For  smaller  roots,  where  sampling  error  only  is  being  measured,  the  variance 
increases  to  where  it  corresponds  to  that  of  independent  data,  as  one  would 
expect.  This  variance  of  .004  is  only  one-tenth  what  one  would  expect  if 
the  same  parameters  had  operated  on  independent  data. 

Unfortunately,  no  dependable  rule  of  thumb  can  be  inferred  which  would 
apply  to  all  data.  It  will  depend  on  how  strong  the  population  eigenvector 
is  to  which  the  sample  is  tending.  As  a  very  crude  first  approximation, 
one  might  measure  this  strength  by  means  of  the  largest  eigenroot  V,  and 
hence  adopt  5/Vn/TT  as  a  rule  of  thumb  for  the  variance. 

One  further  important  possibility  opens  to  the  investigator  because  of 
the  relative  unimportance  of  the  size  of  N.  If  the  investigator  has  say 
400  observations,  he  can  do  one  analysis  on  all  the  data,  and  then  divide  the 
data  in  half  at  random  and  do  separate  analyses  on  each  half  as  well.  Because 
the  data  from  200  observations  will  be  nearly  as  "good"  as  that  from  400, 
it  follows  that  any  factor  that  seems  to  appear  in  the  analysis  of  the  400 
observations  is  dependable  only  if  it  can  also  be  discerned  in  the  analysis 
of  each  of  the  two  halves  of  the  data. 

6.  The  most  unexpected  result  of  this  investigation  is  that  with  the 
use  of  SMC  communalities  on  the  Unitary  Trait  data,  it  can  be  statistically 
established  through  the  sampling  scheme  used  here  that  there  is  only  one 
significant  factor  in  the  data,  rather  than  four  (e.g.,  Harman,  Reference  2, 
Table  9.22). 

To  begin  establishing  these  results,  it  is  instructive  first  to  compare 
the  difference  which  the  choice  of  communality  imposes  on  the  size  of  eigen¬ 
values.  The  averoid  and  bi-factor  data  in  Table  8  are  from  Harman  (Reference 
2,  Table  9.21).  The  calculations  were  actually  performed  on  different 
computers,  as  well,  but  Harman  establishes  (Reference  2,  Table  9.23)  that 
only  very  minor  discrepancies  can  be  associated  with  computer-to-computer 
differences.  Major  differences  are  due  primarily  to  choice  of  communality. 

It  is  strikingly  evident  that  both  with  SMC  communalities  for  the 
correlation  matrix,  and  also  with  the  average  of  eleven  eigenvalues  based 
on  sample  matrices  from  this  matrix,  the  significance  has  all  been 
concentrated  into  a  single  general  factor.  The  similarity  of  this  factor  to 
-he  general  factor  based  on  other  communality  estimates  is  given  in  Table  9. 


148 


Table  8 


Relation  Between  Communality  Estimate  and  Eigenvalues 


of  24- 

variable  Matrix 

,  With  Sampling  Error 

Communality 

Estimate 

Sample 

S.  Dev. 

Order 

Averoid 

Bi-Factor 

SMC 

SMC 

(10  d.f . ) 

1 

7.63 

7.66 

7.66 

7.55 

.57 

2 

1.65 

1.65 

.38 

.56 

.33 

3 

1.17 

1.18 

.38 

.34 

.06 

4 

.90 

.96 

.29 

.29 

.07 

5 

.40 

.42 

.24 

.26 

.06 

6 

.35 

.40 

.23 

.21 

.06 

7 

.27 

.31 

.20 

.18 

.04 

8 

.25 

.30 

.18 

.15 

.04 

9 

.21 

.23 

.14 

.12 

.04 

10 

.14 

.16 

-.04 

.10 

.04 

11 

.07 

.19 

.00 

.08 

.04 

12 

.01 

.05 

-.01 

.05 

.04 

13 

.00 

.03 

-.08 

.03 

.06 

14 

-.04 

-.01 

-.08 

.01 

.06 

15 

-.08 

-.07 

-.10 

-.02 

.06 

16 

-.09 

-.07 

-.12 

-.05 

.07 

17 

-.13 

-.09 

-.15 

-.08 

.07 

18 

-.16 

-.14 

-.15 

-.09 

.08 

19 

-.18 

-.16 

-.17 

-.12 

.08 

20 

-.20 

-.19 

-.25 

-.16 

.05 

21 

-.24 

-.21 

-.27 

-.20 

.05 

22 

-.26 

-.23 

-.39 

-.23 

.05 

23 

-.31 

-.27 

-.43 

-.28 

.06 

24 

-.34 

-.31 

-.49 

-.34 

.08 

It  is 

clear  that 

there  is  general  agreement 

between 

this  main  factor 

calculated 

in  the  various  ways.  The 

first  value 

in  the  ' 

'Average  SMC" 

column  for  instance,  .580,  is  the  arithmetic  average  of  11  values,  each  in 
turn  calculated  from  a  sample  of  145  observations.  Those  eleven  values 
range  from  .472  to  .669  with  a  standard  deviation,  as  indicated,  of  .072. 
(There  is  a  slight  downward  bias  in  these  averages  as  calculated,  because 
they  have  been  scaled  in  the  square  metric  to  the  eigenroot ,  and  any 
averaging  of  the  numbers  ought  to  be  done  in  the  same  way  instead  of 
arithmetically  as  here.) 
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Table  9 


Comparison  of  a  General  Factor  in  24- variable  Matrix 
as  Identified  by  Alternative  Communality 
Estimates,  With  Sampling  Error 


Population 

Average 

S.  Dev. 

Test 

Averoid 

SMC 

SMC 

(10  d.f.) 

1 

.596 

.595 

.580 

.072 

2 

.373 

.376 

.367 

.077 

3 

.418 

.425 

.421 

.072 

4 

.484 

.487 

.451 

.048 

5 

.689 

.690 

.658 

.063 

6 

.685 

.686 

.684 

.053 

7 

.676 

.673 

.663 

.044 

8 

.676 

.678 

.647 

.051 

9 

'  .693 

.693 

.675 

.065 

10 

.466 

.463 

.463 

.077 

11 

.557 

.560 

.550 

.049 

12 

.466 

.468 

.485 

.068 

13 

.601 

.600 

.593 

.060 

14 

.425 

.424 

.423 

.058 

15 

.391  • 

.390 

.364 

.058 

16 

.506 

.509 

.498 

.050 

17 

.465 

.465 

.462 

.078 

18 

.520 

.519 

.511 

.099 

19 

.444 

.451 

.483 

.098 

20 

.616 

.619 

.631 

.063 

21 

.595 

.598 

.597 

.030 

22 

.612 

.614 

.615 

.041 

23 

.690 

.693 

.686 

.057 

24 

.651 

.653 

.656 

.040 

V 

7.628 

7.665 

7.550 

.58 

So  it  would  appear  that  the  averoid-based  general  factor  might  have  been 
hit  upon  by  chance  due  to  sampling  the  data  and  calculating  SMC -based 
communalitics.  The  surprising  thing  however  is  that  the  averpid-based  factor 
is  only  one  of  four  (see  Hannan,  Reference  2,  Table  9.22)  whereas  all  the 
SHC-based  samplings  succeed  in  concentrating  all  the  factor  information  into 
single  factor. 

A  somewhat  similar  sampling  relationship  will  come  out  if  we  compare 
communality  estimates.  Briefly,  the  averoid  estimate  for  the  first  test  was 
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.505,  whereas  the  sampling  provided  11  SMC  estimates,  varying  from  .428  to 
.666  and  averaging  .579.  However,  here  we  begin  to  discern  the  discrepancies 
which  produce  further  factors  in  one  case  but  not  in  the  other.  Nine  of 
these  eleven  SMC  estimates  exceed  the  averoid  estimate.  In  later  tests  all 
11  SMC-based  communalitics  exceed  the  corresponding  averoid  communality. 

The  result  of  this  is  that  the  factors  of  verbal  rigidity,  spatial,  and 
memory,  discerned  by  averoid-based  communalities ,  are  all  absorbed  into  the 
general  factor  of  the  SMC-based  communality.  The  11  eigenvalues  of  the 
second  factors  have  an  average  value  of  .556  as  given  in  Table  8,  Only 
two  of  the  eleven  exceed  unity,  and  these  two  do  not  have  the  same  sign 
pattern  as  any  of  the  factors  of  Harman  (Reference  2,  Table  9.22). 

As  any  objective  test  of  the  insignificance  of  the  second  factor,  a 
sign  test  was  made  of  the  eigenvectors  from  the  eleven  samplings.  If  any 
significant  weight,  plus  or  minus,  was  in  this  second  factor,  then  there 
would  be  a  tendency  for  plus  or  minus  signs  to  occur  opposite  that  test  in 
each  of  the  eleven  iterations.  With  11  iterations  and  half  of  the  weights 
minus,  a  non-parametric  1  per  cent  test  would  consist  of  0,  1,  10,  or  11 
like  signs  corresponding  to  one  of  the  24  tests.  None  were  observed.  If 
we  weaken  the  test  to  comprise  0,  1,  2,  9,  10,  or  11  like  signs,  we  have 
a  six  per  cent  test,  and  would  expect  to  find  1.6  of  the  24  tests  with 
these  sign  compositions.  In  fact  we  found  two,  test  6  with  nine  minus 
signs  and  test  18  with  two  minus  signs,  just  about  as  expected.  Further, 
these  signs  are  the  opposite  of  what  we  would  expect  if  we  were  measuring 
the  verbal  rigidity  factor,  the  number  two  factor  of  the  averoid  analysis. 

The  conclusion  here  was  quite  unexpected  but  seems  inescapable — the 
use  of  SMC  communalities  contributes  far  more-than  expected  to  the  parsimony 
with  which  the  relationships  in  the  Unitary  Trait  data  can  be  described. 

It  would  seem  that  if  further  factors  are  to  be  discerned  in  the  data, 
a  much  larger  sample  size  must  be  employed. 

Let  us  summarize  the  practical  results  of  the  foregoing  analysis,  as 
it  touches  the  issues  of  uniqueness  and  sampling,  i.e.,  how  sure  the 
investigator  can  be  of  his  results. 

1.  The  coefficient  of  variation  of  the  largest  root  is  10  per  cent. 

2.  The  variance  of  the  eigenvectors  associated  with  this  root  is 
5/Vn/¥  . 


3.  One  hundred  to  two  hundred  observations  on  each  of  the  n  variates 
should  be  enough.  If  more  than  200  can  be  collected,  split  the  data  in  half 
at  random  and  run  each  half  separately  as  well. 

4.  Use  SMC  communalities  and  make  all  factors  beyond  the  first  prove 
their  existence  before  you  accept  them. 
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SECTION  VII 


APPLICATION  OF  FACTOR  ANALYSIS 

7.1  INTRODUCTION 

Three  applications  of  factor  analysis  are  presented  in  this 
section.  Subsection  7.2  contains  explanations  of  how  factors  are 
interpreted  for  psychophysiological  data.  These  examples  should 
give  greater  insight  into  the  interpretation  problem  in  general. 

Factor  Analysis  is  presented  purely  as  a  representation  technique 
in  subsection  7.3. 

7.2  FACTOR  ANALYSIS  OF  PERSONAL  HISTORY  AND  ANTHROPOMETRY  DATA 
Included  in  this  section  are  two  factor  analytic  studies  which 

were  performed  on  data  collected  in  a  psychophysiology  experiment .* 

The  first  analysis  is  of  personal  history  data  ascertained  from  the 
subjects  by  a  questionaire  which  contained  approximately  150  items. 
Many  of  the  variables  were  derived  from  more  than  one  response,  and 
some  of  the  original  items  were  deleted  since  they  were  discrete 
data  points .  After  careful  quantifying  and  scrutinizing,  41  variables 
were  retained.  Eighty -eight  subjects  were  used.  In  this  and  the 
following  study,  the  subjects  were  University  of  Dayton  students. 

The  second  study  is  concerned  with  106  anthropometric  measure¬ 
ments  taken  on  131  subjects  in  the  same  experiment.  Unlike  the 
personal  history  variables,  this  data  set  was  already  quantified. 
Variables  included  a  number  of  heights,  breadths,  circumferences,' 
and  diameters. 

In  both  studies ,  the  principle  components  method  was  applied 
using  unities  as  an  estimate  of  communality .  The  number  of  eigen¬ 
values  greater  than  one  was  used  as  a  completeness  criterion,  i.e., 
determining  of  the  number  of  factors  to  be  rotated.  Varimax  was 
the  method  of  rotation  employed  for  both. 

The  personal  history  data  produced  14  factors.  It  is  important 
to  stress  at  this  point  that  one  must  be  extremely  familiar  with  the 

*These  factor  analyses  were  performed  under  Contract  AF33( 615) -1119 
monitored  for  the  U.  S.  Air  Force  by  Major  Victor  H.  Thaler,  6570th 
Aerospace  Medical  Research  Laboratories.  Wright -Patterson  Air  Force 
Base. 
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variables  and  what  they  represent,  as  well  as  the  make-up  of  the 
subject  group,  before  a  meaningful  interpretation  of  the  factors 
can  be  made.  In  this  case,  all  the  factors  could  be  identified 
conceptually. 

The  best  approach  to  interpreting  the  various  factors  is  to 
examine  them  one  by  one  and  note  those  variables  which  have  the 
highest  loadings.  For  example,  in  Table  10  it  can  be  seen  that 
Major  Subject  (.90),  Educational  Goals  (-.74),  and  Vocational 
Plans  (.88)  have  the  highest  loadings  in  factor  1,  indicating 
that  the  factor  is  associated  with  educational-vocational  plans . 
Since  Vocational  Plans  and  Major  Subject  were  ranked  from  "academic" 
to  "applied"  in  nature,  and  Educational  Goals  was  rated  in  the 
direction  of  higher  educational  motivation,  it  would  seem  logical 
to  conclude  that  the  more  applied  the  subject's  interest,  the 
more  likely  that  advanced  degrees  (law.  Master,  Doctorate)  are 
not  desired.  In  addition,  variables  such  as  Home  Address -Distance 
(.38)  and  Full  Scale  IQ  (-.31)  should  be  considered.  This  is 
where  familiarity  with  the  data  is  necessary .  It  was  concluded 
here  that  I.Q.  probably  had  a  tendency  to  relate  to  higher 
educational  goals  and  more  academic  interests.  However,  Home 
Address-Distance,  which  is  the  distance  between  the  subject's 
home  and  the  university,  was  thought  to  be  a  less  universal  value. 
The  relationship  is  likely  peculiar  to  this  university  because 
of  its  academic  standards.  Thus,  this  factor  would  be  considered 
in  terms  of  educational  and  vocational  plans,  with  some  degree 
of  ability  being  associated. 

The  second  factor  is  quite  straightforward  in  interpreting. 

It  is  obviously  related  to  socio-economic  level .  The  variables 
Income  of  father  (.71),  Socio-economic  Rating  (-.77),  Education 
of  Father  (.81),  and  Education  of  Mother  (.68)  load  most  highly, 
indicating  a  strong  relationship  between  education  and  income. 

Note  that  Socio-economic  Rating  is  negative  in  direction  because 
of  the  scoring  technique  employed,  i.e. ,  the  higher  the  level, 
the  lower  the  score.  Furthermore,  the  educational  achievement  of 
the  father  appears  to  be  most  important.  Thus,  the  variables 
indicate  the  factor  is  a  measure  of  socio-economic  level. 
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The  remaining  factors  may  be  interpreted  in  a  similar  manner. 
The  interaction  of  the  variables  in  some  factors  is  more  subtle 
and  complicated,  however,  requiring  a  greater  degree  of  -insight. 

The  second  example  of  a  factor  analytic  study  is  that  of 
106  relatively  homogeneous  anthropometric  measurements.  As  seen 
in  Table  11,  17  factors  were  produced  according  to  the  eigenvalue 
criterion.  The  great  number  of  high  correlations  between  the 
variables  is  responsible  for  the  dramatic  reduction  of  variables 
to  factors  by  approximately  one-sixth. 

Using  the  same  interpretation  technique  as  in  the  previous 
example,  it  can  easily  be  observed  which  variables  load  most 
highly  on  each  factor.  Factor  1  in  Table  11,  for  example,  has 
a  large  number  of  variables  loading  at  0.6  and  above.  Both 
present  and  maximum  weight  of  the  subject  load  with  a  number  of 
body  breadths  such  as  shoulder,  chest,  waist,  and  buttock,  as 
well  as  various  depths,  circumferences,  and  skinfolds.  Also 
showing  some  importance  hre  the  somatotypes .  The  first,  or 
endomorphic,  somatotype  depicts  the  amount  of  softness  and  round¬ 
ness  characteristic  in  the  subject's  body,  while  the  "G" 
(gynandromorphic)  somatotype  is  concerned  with  the  degree  of 
femininity  in  the  body.  In  addition,  the  third,  or  ectomorphic, 
somatotype  loads  at  -0.45,  and  should  be  considered  since  it  depicts 
the  lean  or  frail  body. 

Consequently,  the  various  items  point  toward  a  factor  which 
explains  general  body  size,  but  not  height.  The  breadths,  depths, 
and  circumferences  which  load  are  those  of  the  trunk,  and  do  not 
include  the  extremities.  The  somatotypes  must  be  considered,  as 
they  help  clarify  and  confirm  the  nature  of  the  factor.  Obviously, 
the  endomorphic  and  gynandromorphic  body  would  have  greater  measure¬ 
ments  on  the  pertinent  variables,  while  the  opposite  would  be  true 
for  the  ectomorphic  body.  Thus,  this  factor  can  be  labeled  general 
trunk  dimensions. 

The  second  factor  has  its  major  loadings  on  measurements  of 
height  and,  naturally,  stature.  Again,  the  loadings  are  extremely 
high,  usually  in  the  range  of  0.80  to  0.95.  To  a  lesser  extent,  hand 
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and  foot  dimensions  appear  in  this  factor.  This  is  understandable, 
however,  since  taller  people  usually  have  longer  hands  and  feet. 

Of  equal  importance  in  this  factor  is  the  third,  or  ectomorphic, 
somatotype  with  a  0,76  loading.  The  long,  lean  body  which  it 
represents  is  in  line  with  the  general  nature  of  this  factor. 
Therefore,  it  would  probably  be  labeled  as  stature. 

Continuing,  the  third  factor  may  be  interpreted  as  being 
relevant  to  grip  strength  or  arm  muscle.  The  highest  loadings 
appear  on  the  three  grip  strength  variables,  while  minor  loadings 
appear  on  the  biceps  and  forearm  measures.  Although  the  later 
are  about  one-half  the  size  of  the  grip  strength  variables,  the 
fact  that  all  other  loadings  are  negligible  and  that  the  biceps 
and  forearm  dimensions  logically  relate  to  the  strength  necessi¬ 
tates  their  inclusion  in  interpretation. 

By  examination,  factor  4  is  a  testicle  factor,  and  factor  5 
is  a  penis  factor.  Factor  6  is  concerned  with  dimensions  of  the 
head,  while  factor  7  pertains  more  to  facial  measurements.  The 
various  measurements  of  the  hands,  wrist,  and  feet  comprise  the 
eighth  factor.  Similarily,  the  remaining  factors  may  be  defined 
by  careful  examination  of  significant  loading  and  consideration 
of  their  conceptual  importance. 

While  the  above  factors  are  all  fairly  clear  because  of  the 
nature  of  the  variables,  this  is  not  always  the  case.  Therefore, 
it  must  be  reiterated  that  without  a  complete  understanding  of 
the  nature  of  the  variables  and  the  subject  population,  no  meaning¬ 
ful  interpretation  can  be  made. 

7.3  FUNCTION  REPRESENTATION 

A  function  may  bo  represented  in  many  different  ways.  The 
function  "sine"  has,  for  example,  a  Taylor  series  representation, 
a  continued  fraction  representation,  an  infinite  product  represents 
tion,  and  a  Chebyshev  series  representation. 

The  choice  of  method  for  representing  the  function  depends  on 
the  purpose  for  which  the  representation  ir  to  be  used.  If  it  is 
desired  to  study  a  certain  property  of  a  function,  a  representation 
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is  chosen  which  is  known  to  highlight  that  class  of  properties.  A  Fourier 
series  representation  may  be  chosen  when  the  frequency  content  of  a  function 
is  of  interest,  for  example.  When  the  purpose  includes  evaluation  of  the 
function,  properties  of  the  representation  such  as  speed  and  region  of 
convergence  help  dictate  the  choice. 

The  properties  of  various  classes  of  representation  techniques  have 
been  the  point  of  much  interest  in  the  history  of  mathematics ,  and  probably 
the  most  studied  class  of  representation  techniques  has  been  that  of 
orthogonal  function  expansions.  This  is  so  because  the  properties  of 
orthogonal  function  expansions  have  been  found  most  desirable  and  useful 
in  practice.  However,  a  set  of  orthogonal  functions  is  usually  obtained 
in  practice  by  solving  differential  equations.  Thus,  in  order  to  have 
a  set  of  orthogonal  functions  which  reflect  the  properties  of  a  class  of 
functions,  a  differential  equation  must  be  associated  with  that  class  of 
functions . 

Out  of  the  proliferation  of  different  orthogonal  sequences  such  as 
the  Legendre,  Chebyshev,  baguerre,  and  Hermite  polynomials  came  the 
unifying  statement  that  all  of  these  classical  polynomials,  ^(x),  when 
multiplied  by  a  particular  weight  function  are  solutions  to  the  second 
order  differential  equation 

G(x)y"  +  (2  G'(x)  -  (^(x)}  y’  - 

|  n--~— -  G"(x)  +  (n  +  1)  *J(x) 

where  y  (x)  =  w(x)<J>  (x).  The  effect  of  this  statement  was  to  provide  a 
n  n 

channel  through  which  theory  on  one  orthogonal  sequence  could  be  applied 
to  another  orthogonal  sequence. 

In  the  last  five  years  physical  scientists  have  shown  interest  in 
other  methods  which  yield  sets  of  orthogonal  functions.  The  method  about 
to  be  discussed  may  be  characterized  by  an  attempt  to  represent  each  member 
of  a  set  of  functions  by  a  linear  combination  of  nonlinear  functions  which 
span  the  space  of  possible  given  functions.  The  method  obtains  the  basis 
functions  by  analysis  of  a  symmetric,  positive  semidefinite  matrix  obtained 
from  the  given  functions  by  various  methods.  Moreover,  it  is  possible  to 
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obtain  a  set  of  orthogonal  basis  functions  which  contribute  maximally  and  in  a 

decreasing  manner  to  the  total  variance  of  the  given  functions. 

In  what  follows  the  given  functions  x.(t)  (i  =  1,  2,  •••  ,n)  will  be 

*  th 

represented  discretely  at  N  values  of  t  by  a  vector  with  j  component  x  . 

In  factor  analysis  the  given  functions  are  first  standardized  by  trans¬ 
forming  to  functions  with  zero  mean  and  standard  deviation  of  one.  A  correlation 
matrix,  with  elements  r„  is  then  formed.  Factor  analysis  provides  very  many 
methods  for  analyzing  the  correlation  matrix  including  principal  components. 

The  method  of  principal  components  depends  on  obtaining  a  representation  of 
the  transformed  given  function  as 


x.  =  a.  ,F,  +  a.  F  + 
1  H  1  12  2 


+  a.  F 
in  n 


where  it  is  assumed  that  the  (F^)  are  orthogonal  functions.  The  method  of 
principal  components  is  based  on  the  ability  to  spectrally  resolve  a  linear 
symmetric  operation  into 

■  •  .  y,«J  *  *tV«  *  *  *«V»  • 

v/here  Je^  is  the  normalized  eigenvector  corresponding  to  the  eigenvalue 
of  R.  Then  based  on  this  spectral  resolution  of  an  operator,  when 
■’  a.,  -  47  e.  . 

is  chosen,' 


a.  F  +  a.F  +  • • •  +  a.  F 
il  1  12  2  in  n 


will  Indeed  represent  x^  since 


:f,f  :  XX  =  (AF)  (AF)  =  AFF  A1  =  AA  =  R 

r'  Jr"V,  ■  .  ^ 

\  Other  methods  of  factor  analysis  make  use  of  the  full  factor  analysis  model 

which  includes  unique  factors  or  functions: 

x.  =  a.F,  +  a.  F  +  •••  +  a.  F  +  a.U.  . 
i  H  1  12  2  lm  m  i  i 

The  purpose  of  this  model  which  includes  unique  factors  is  to  reduce  to  a 

minimum  the  number  of  factor  functions  {F,  }  which  contribute  to  more  than  one 

k 

given  function  x^. 

There  are  methods  in  factor  analysis  for  obtaining  a  set  of  factor  functions 
which  are  not  orthogonal  but  oblique.  These  oblique  factor  functions  are  chosen 
so  as  to  demonstrate  the  properties  of  the  class  of  given  functions  in  some  way 
better  than  the  orthogonal  factor  functions. 
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Orthogonal  rather  than  oblique  functions  arc  usually  used  for  function 
representation  since  they  have  such  nice  properties  and  are  easy  to  handle. 
Indeed,  going  from  orthogonal  to  oblique  functions  is  like  going  from  linear 
to  nonlinear  systems. 

Nonlinear  systems  are,  however,  many  times  closer  to  reality.  Just  so, 
when  it  is  desired  to  have  the  basis  functions  or  factors  represent  concepts 
or  actual  causes  of  variance,  oblique  factors  must  be  allowed  since  most 
conceptual  causes  of  variance  are  related  (therefore  not  independent  or 
orthogonal.)  Then  let  us  consider  the  effect  of  these  statements  on  the  theory. 

When  the  factor  analysis  of  functions  is  stated  as  the  problem  of  finding 
matrices  A  and  F  are  such  that 

Z  =  AF  (1) 

where  A  is  the  matrix  ensemble  of  function  vectors,  A  and  F  are 
underdetermined.  There  are  an  infinite  number  of  matrices  A  and  F  which 
will  satisfy  Equation  1,  just  as  there  exist  an  infinite  number  of  pairs  of 
vectors  which  will  span  a  two-dimensional  space.  In  the  principal  components 
factor  analysis  discussec  earlier,  the  condition  of  maximal,  decreasing 
contributions  to  variance  fixed  the  matrices  and  made  the  problem  determinate. 

If  initially  we  have  any  A  and  F  satisfying  Equation  1,  we  may 
find  others  by  "rotating"  the  given  factors,  i.e.  by  transforming  each  of 
the  given  factors  by  an  orthogonal  transformation  matrix  T.  For  example, 
in  2-space  the  factors  may  be  rotated  as  shown  in  Figure  7. 

In  the  analysis,  the  new  oblique  factors  are  derived  by  rotating  a 
given  (usually  orthogonal)  system  to  a  new,  preferred  oblique  system.  It 
is  postulated  that  a  set  of  factors  is  more  meaningful  when  each  factor 
goes  through  a  separate  "cluster"  of  functions  (when  a  group  of  functions 
are  similar,  their  vector  representations  will  be  close  to  each  other  in 
space). 

In  attempting  to  find  mathematical  statements  equivalent  to  this 
intuitive  statement,  most  approaches  reason  as  follows:  When  a  factor 
passes  through  a  cluster  of  functions,  the  coefficients  of  that  factor 
for  the  nearby  functions  will  be  large  while  the  coefficients  of  other 
factors  for  this  cluster  of  functions  will  be  small.  This  rationale  has 
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INITIALLY 


AFTER  ROTATION 


Figure  7.  Rotated  Factors 

found  its  mathematical  expression  in  the  maximization  or  minimization  of 

various  functions  of  powers  of  various  coefficients. 

Representation  using  oblique  factors  is  not  quite  so  simple  as  in  the 

orthogonal  case  since  the  coefficients  are  no  longer  Fourier  coefficients. 

However  the  same  method  may  be  used  to  calculate  the  coefficients.  For 

example,  suppose  we  wish  to  expand  a  new  function  Z  in  terms  of  two  known 

factors  F,  and  F  , 

12 


31F1  *  V2 


(2) 


If  the  factors  are  orthogonal,  we  find  the  Fourier  coefficients  by  taking 
the  inner  product  of  both  sides  of  Equation  2  with  each  of  the-  factors. 
Thus , 


(Z|Fj)  =  ai<FllF1>  +  a2(rJF2) 

(Z|F2)  =  aj(Fj|r2)  +  a2(F2|F2) 


(3) 


and  when  the  factors  are  orthogonal. 


Y)0 


i 


and 


a.  =  (Z|F.). 

However,  with  oblique  factors 

lrilFi>  *  hi. 

Thus  the  equations  do  not  degenerate  but  must  be  solved  simultaneously. 
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Varimax  Rotation  of  41  Personal  History  Variables  -  14  Factors 
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Rotated  Factor  Matrix 
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Section  VIII 


RECOMMENDATIONS 

Besides  a  survey  of  factor  analysis,  theory  was  extended  in  the  areas 
of  effects  of  the  number  of  observations,  sampling  effects,  interpretation 
of  factors,  and  communality .  There  are  other  areas  of  factor  analysis 
which  are  suggested  to  be  further  studied. 

There  are  many  multivariate  analysis  models  which  are  closely  related, 
such  as,  intrinsic  analysis,  LoSve-Karhunen,  latent  structure  analysis,  and 
latent  profile  analysis  models.  A  comparative  study  is  needed  to  clarify 
similarities  and  differences  of  these  models. 

Factor  analysis  packages  should  be  made  more  adaptive,  i.e.  more 
decisions  could  be  made  by  the  computer.  For  example,  the  number  of 
factors  for  rotation,  the  grouping  of  variables,  etc.,  as  a  matter  of  fact, 
the  computer  should  handle  the  data  up  to  the  point  of  naming  the  factors. 
This  would  make  factor  analysis  available  to  all  scientists  with  a  minimum 
effort  on  their  part. 
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Appendix  I 


COMPUTER  PROGRAM  WRITE-UPS 


The  factor  analysis  package  presented  in  this  appendix 
consists  of  four  programs  whose  write-ups  are  contained  in 
this  appendix.  The  four  program  abstracts  follow  below  in 
front  of  the  write-ups  themselves. 


Factor  Analysis  Program 


This  program  is  a  specialized  version  of  the  A70A  program 
available  from  System  Development  Corp.  which  originated  at 
the  Harvard  Statistical  Laboratory.  A  factor  matrix  is 
computed  using  the  Jacobi  method.  Input  is  restricted  to  a 
Pearsonian  correlation  matrix  read  from  Fortran  binary  tape. 


Factor  Rotation  Program 


This  program  is  computationally  identical  to  the  A26D 
program  available  from  System  Development  Corporation.  An 
orthogonal  rotation  is  performed  using  the  Kaiser  Varimax 
criterion.  Input  is  restricted  to  a  Fortran  binary  tape 
prepared  by  the  factor  analysis  program,  SRL-FA1.  An  ecuracy 
check  is  provided  by  computing  and  printing  the  differences 
between  the  original  and  the  final  communalities . 


C .  Oblimax  Rotation  Program 

SRL-0B1  is  a  general  purpose  program  which  transforms  the 
factor  analysis  model  for  a  set  of  orthogonal  factors  to  the 
model  for  a  set  of  oblique  ones,  i.e.,  it  rotates  factors  to 
a  more  meaningful  oblique  set. 


Given  an  orthogonal  factor  pattern  A  on  binary  tape, 
the  program  uses  the  OBLIMAX  criterion  to  find  a  transformation 
matrix  A  and  reference  structure  matrix  V  such  that 


V  =  AA. 


as  in  Harman  (Reference  2,  p.  310).  The  heart  of  this 
rotation  is  the  specialized  version  of  an  OBLIMAX  rotation 
routine  obtained  from  the  University  of  Illinois. 

Using  V  and  A,  other  output  forms  of  the  oblique 
factor  analysis  model  are  then  computed,  in  particular: 

P  -  the  new  factor  pattern 
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S  -  the  factor  structure 

<j>  -  the  matrix  of  factor  correlations 

P  and  <j>  may  be  written  on  tape  for  further  use. 

D .  Factor  Scores  Estimation  Program 

This  program  computes  estimated  factor  scores  using  the 
equation 

f  =  <j>ArR~l2 

which  is  (16.2)  in  Harman  (Reference  2,  p.341).  Input  to 
the  program  consists  of  the  correlation  matrix  R,  the  factor 
coefficient  matrix  A,  the  factor  correlation  matrix  <J>  (if 
needed),  and  the  raw  scores.  Output  consists  of  the  estimated 
factor  scores  (both  listing  and  punched  cards),  as  well  as 
R~  1  ,  test  coefficients  for  standard  scores  and  for  raw  scores, 
if  desired. 
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A.  Factor  Analysis  Program 

Self-Contained  General  Purpose  Program 
LANGUAGE:  Fortran  II 

PURPOSE :  To  compute  a  factor  matrix  using  the  Jacobi  method 

and  write  factor  loadings  on  to  binary  tape. 


MSniCIICNS : 

No.  of  variables 
Input 

Output 


iiESCR I PTI 0NJ._ySE_G_C0fi!;1ENTS  ; 
Tape  Assignment 

Logical  Tape: 

'  2 
3 

5 

6 
9 

5iH2£I_tiE2i_l£E£_Efi.EM  i 

Record  1  (2  words) 


Maximum  of  150 
Binary  tape  containing 
Pearsonian  correlation  matrix 
Available  outputs  in  BCD  mode: 

a)  Correlation  matrix 

b)  Latent  roots  and  vectors 

c)  Factor  1 osdi ngs 

Output  in  binary  mode: 
a)  Factor  loadings  for 
input  to  factor  rota¬ 
tion  program 


System  BCD  input  tape. 

System  BCD  output  tape. 
Correlation  matrices  in 
binary  mode. 

Factor  loadings  in  binary  mode. 
Used  for  temporary  storage  of 
eigenvalues  and  eigenvectors. 


Problem  number  and  order  of  square  matrix  (N)  - 
both  in  integer  form. 

Record  2  through  N+l  (N  words  each) 

One  record  for  each  row  of  correlntion  matrix 
(ones  in  diagonal). 

Card  Deck  Preparation 

Each  problem  to  be  run  requires  two  data  cards  ns 
follows: 
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A.  Title  Card 


Col  1 
Col  2-72 

B.  Problem  Card 
Col  1-5 

Col  6-8 

Col  9-10 

Col  11 

=  1 


=  0 

Col  12 

=  1 

=  0 

Col  13 

=  1 
=  2 

=  3 
=  4 


PUNCH  1 

Any  BCD  information  desired 
as  page  headings  for 
printed  output. 


Problem  number  used  to  locate 
proper  matrix  on  input  tape 
and  to  identify  BCD  output. 

Number  of  variables  in  this 
analysis. 

If  all  eigenvectors  (and  con¬ 
sequently  factor  loadings) 
are  to  be  computed,  leave 
these  columns  blank.  Other¬ 
wise  punch  the  reduced  number 
of  eigenvectors  desired. 

The  correlation  matrix  with 
communality  adjustments  is 
to  be  printed. 

This  matrix  is  not  to  be 
printed. 

Latent  roots  and  vectors  are  to 
be  printed. 

Latent  roots  and  vectors  are 
not  to  be  printed. 

Estimation  of  communa 1 i t i es 

Maximum  row  element. 

r2  (square  of  multiple  correla¬ 
tion  coefficient  of  given 
variable  with  all  other 
variables). 

Unities  are  retained. 

Image-covariance^factor  analysis. 
(Essentially  R  with  appropriate 
adjustment  of  the  off  diagonal 
elements  to  maintain  the  posi¬ 
tive  semi-definiteness  of  the 
me  t  r  i  x)  . 
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Col  14 


=  1  Fnctor  loadings  arc  to  be 

written  on  logical  tape  6 
for  input  to  fnctor  rota¬ 
tion  program. 

=0  Factor  loadings  are  not  to  be 

written. 

C.  Finish  Card 

Col  1-6  Punch  FINISH 

D.  Blank  Card 

Note:  Caros  C  and  D  must  follow  the  Problem 
Card  for  the  final  problem. 

ROUTINES  USED  IN  SitL-FA  1 

hEli'V  This  routine  is  provided  to  rewind  and  unload 

tapes . 

FMLEV  This  routine  determines  the  eigenvalues  and 
eigenvectors  of  a  symmetric  matrix.  It  is 
one  of  several  eigenvector  routines  which 
are  available  from  SHARE. 
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B.  Factor  Rotation  Program 

CLASSj.  Self-contained  General  Purpose  Program 
LANGUAGE:  Fortran  II 

PURPOSE:  To  perform  an  orthogonal  rotation  using  the 
Kaiser  Varimax  criterion. 

RESTRICTIONS: 

Number  of  variables  250  maximum 

Number  of  factors  50  maximum 

DESCRIPTION.  USE  6  COMMENTS: 

Tape  Ass iqnmen t 

Logical  Tape: 

2 
3 
6 

Card  Deck  Preparation 

A.  Probl em  Card 

Each  problem  to  be  run  requires  a  single  control 
card  as  follows: 

Col  1-5 

Col  6-8 
Col  9-10 

B.  Finish  Card 

A  series  of  problem  cards  is  followed  by  the 
following  card  to  signify  that  all  problems 
desired  have  been  run: 

Col  1-5  Punch  09999 

SUBROUTINES : 

REfiV  This  routine  is  employed  to  rewind  and 

unload  the  binary  input  tape. 


Problem  number  used  to  locate 
proper  factor  loadings  on 
input  tape  and  to  identify 
BCD  output. 

Number  of  variables. 

Number  of  factors  to  be  read 
from  tape  and  rotated. 


System  BCD  input  tape. 
System  BCD  output  tape. 
Binary  input  of  factor 
loadings  from  SRL-FA1 
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c.  Ob  Umax  Rotation  Program 


CLASS:  Self-Contained  General  Purpose  Program 

LANGUAGE :  Fortran  II 

PURPOSE :  To  rotate  orthogonal  factors  to  a  set  of  oblique  factors 

using  the  OBLIKAX  criterion  and  to  compute  various  output  forms  of 
the  model. 


RESTRICTIONS: 

Matrix  size 

Input 

Output 


DESCRIPTION.  USE  &  COMMENTS: 

Tape  Assignment 

Logical  Tape: 

2 

3 

6 

5 


Mo.  of  variables  plus  no.  of 
factors  t  130. 

Binary  tape  containing  orthogonal 
factor  pattern. 

Available  in  BCD  mode 

1)  Transformation  matrix 

2)  Reference  structure 

3)  Reference  vector  correlations 

4)  P.eciprocals  and  inverses  of 
elements  of  diagonal  matrix 

5)  (Primary)  factor  pattern 

6)  (Primary)  factor  correlations 

7)  (Primary)  factor  structure 

Binary  mode 

1)  (Primary)  factor  correlations 
for  input  to  second  order 
factor  analysis 

2)  (Primary)  factor  pattern 


System  BCD  input  tape. 

System  BCD  output  tape. 

Factor  pattern  input  and  output 
in  binary  mode. 

Tactor  correlation  output  in 
binary  mode. 


Binary  Tape  Input  Format 
Record  1  (3  words) 

Problem  number,  no.  variables  (NVAR),  no.  factors  (NFAC). 
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Record  2  through  NFAC  +  1  (1  +  NVAR  words  each) 

One  record  for  each  column  of  factor  pattern. 

Each  record  contains  one  dummy  word  followed  by  NVAR  loadings. 
Record  NFAC  +  2  (3  A6  words  for  each  variable)  variable  names 
Binary  Tape  Output  Format 
Factor  Correlations  Tape  5 
Record  1  (4  words) 

Problem  number,  2  dummy  variables,  no.  factors  (NFAC). 

Record  2  through  NFAC  +  1 

One  record  for  each  row  of  the  correlation  matrix. 

Record  NVAR  +  2  factor  names 

The  values  1  through  NFAC  are  set  up  in  the  3  A6  words  for 
factors  1  through  NFAC  respectively. 

Factor  Pattern  Tape  A6 

Same  as  binary  input  format. 

Card  Deck  Preparation 

Each  problem  to  be  run  requires  two  data  cards  as  follows: 

A.  Problem  Card 


Col  1-5 


Col  6-0 


Problem  number  used  to  locate  proper 
matrix  on  input  tape  and  to  iden¬ 
tify  printed  output. 

No.  variables  in  this  pattern. 


Col  9-10  Mo.  factors  to  be  rotated. 

Col  12-14  BCD  output  parameter. 

Form  the  sum  for  desired  output: 
100  -  V,  reference  structure 

010  -  0,  factor  correlations 

001  -  S ,  factor  structure 

blank  -  P,  factor  pattern  (always 

given). 
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Col  16-20 


Col  22-25 


Col  28 


Col  30 


Col  31-34 


Title  Card 
Col  1-78 


Other  options 

200  -  all  output  listed  on  page  [2]. 

Modified  format  where  available. 
400  -  all  output  listed  on 
page  [2].  Modified  format 
(PWRITr. )  and  original  output 
(six-place  accuracy). 

JOB  NUMBER  to  be  written  with  factor 
correlation  on  tape  5  matrix  for 
use  in  locating  it, 

blank  -  correlation  matrix  will 
not  be  written  on  tape  5. 

JOB  NUMBER  to  be  written  with  factor 
pattern  on  tape  6. 

blank  -  factor  pattern  will  not  be 
written  on  tape  6. 

Leave  blank  unless  starting  new 
binary  tape 

1  -  start  new  tape  on  logical 

unit  5 

2  -  start  new  tape  on  logical 

unit  6 

3  -  start  both  new. 

1  -  the  variables  are  to  be  normal¬ 
ized  during  rotation  (made  of 
length  1  in  common  factor 
space  to  insure  that  structure 
values  indicate  angular  close¬ 
ness  of  fit). 

blank  -  communality  of  variables 
left  unchanged. 

E4.2  conversion 

appears  on  3CD  (printed)  output 
beside  values  whose  absolute 
v  alue  is  greater  than  or  equal 
to  this  number. 


blank  -  the  value  0,35  is  used0 
2.0  -  no  *'s;  three-place  accuracy. 


Any  BCD  information  will  be  written 
at  the  top  of  every  page. 
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C.  Finish  Card 


Last  card  in  deck. 


Col  1-5 


ROUTINES  USED  IN  SRL-0B1 


PREAD  Reads  the  binary  input  factor  pattern. 

PWRITE  Prints  the  selected  BCD  output. 

RWRITE  Writes  0  and  P  on  tape.  , 

OBMAX  Performs  the  rotation  and  calculates  the  output. 

GENINV  A  routine  for  symmetric  matrix  inversion. 

MATHEMATICAL  NOTES 

1.  A  mathematical  explanation  of  the  OBLIMAX  rotation  process  may 

be  found  in  Harman  (Reference  2,  p.  310-319)-  The  treatment  is  sketchy 
in  one  respect  and  the  following  extension  may  be  helpful  for  a  com¬ 
plete  understanding  of  this  program.  Barman's  terminology  will  be 
used. 

OBLIMAX  tries  to  maximize  a  function  on  the  elements  of  the 
Reference  Vector  Structure  matrix  by  an  iterative  process  which 
successively  maximizes  the  function  in  each  of  the  planes  formed  by 
each  pair  of  reference  vectors.  Although  the  end  result  is  a  trans¬ 
formation  from  orthogonal  vectors  to  oblique  ones,  before  the  end 
of  the  first  pass  we  must  consider  planes  formed  by,  and  transforma¬ 
tions  on,  oblique  vectors. 


Therefore  let  us  examine  the  general  case  of  factor  rotation 


in  the  plane  of  the 


vectors.  We  are  looking  for  a 


transformation  which  maximizes  a  function  on  the  values  vl 4 ,  i  =  1, 
...,  n,  (see  15. 4, Harman  )  where  J 


=  Xllvij  +  X.tlvik 


and  where  vjj  is  the  correlation  between  variable  i  and  reference 
vector  j,  or  (ZjlAj).  But  note -that  (1)  does  not  define  unique  Xi; 
and  X2i  but  a  "line''  of  tdj,em.  This  is  a  reflection  of  the  fact 
that  in  the  oblique  casef&jt'iWicture  alone  does  not  determine  the 
factor  analysis  description  of  common  factor  space.  Such  a  des¬ 
cription  requires  two  of  the  several  related  matrices.  Thus  in  the 


plane  we  must  use 


X2j  to  transform  one  more  set  of  values. 


i.e.,  find  another  equation  consistent  with  (1).  The  most  practical 


solution  is  to  let 
into  the  new  one: 


transform  the  old  reference  vectors 


X 1 1 A j  +  x21Ak 
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(2) 


As  Aj  is  not  yet  known  we  may  only  deduce  by  squaring 

(Aj|Aj)  s  1  s  Xn  +  A21  +  2XnX21(Ak|Aj)  . 

Likewise  as  v[j  is  not  yet  known  we  must  find  the  set  of  values 
(Xjj  ,X21)  for  which 


l  vij  _  I  <xllvij  +  x2lvik>4 

2  2  - 

[  v!j  I  (XUvij  X21vik)2  2 


is  a  maximum.  For  convenience  let 


(3) 


Then  XllVij  +  X21vik 


(vij  +■  vjkx)X21  and  we  may  simply 


max  k  = 


l  (vij  * 

*2?  I  (vi;j  t  Vikx)2  2 


for  x  because  the  X21's  factor  out  and  cancell. 

Now  combining  (2)  and  (3) 

1  =  X2j  +  x  Xjj  t  2X11x(Aj|Ak) 

and  solving  for  Xu  and  X21  yields 

hi  = - k - 

V  +  2x(A. |Ak)  t  1 

(4) 

-  r====== 

x2  t  2x(Aj  jf.jj)  ’+  l 
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OBLIMAX,  after  finding  x,  finds  the  denominator  in  (4)  by  ^ 
normalizing  the  vector  A-  +  xA^  where  A^  and  A.  are  the  k 
and  j  ‘  columns  of  the  "total"  transformation  matrix  A  (from  the 
original  set  of  orthogonal  reference  vectors  to  the  set  of  oblique 
ones).  An  "updated"  transformation  matrix  may  then  be  generated  by 

Aj  =  AnAj  +  A2lAk  • 

Of  course  OBLIMAX  provides  two  values  for  x  and  the  same  procedures 
are  applied,  using  the  other  value  to  find  the  new  kth  column  of  A 
as  well,  thus  rotating  both  the  kth  and  jth  reference  axes. 

When  the  rotation  is  finished,  OBLIMAX  has  produced  a  trans¬ 
formation  A  with  a  double  use:  it  transforms  the  initial  ortho¬ 
gonal  reference  structure  (equivalent  to  the  factor  pattern  in  the 
orthogonal  case)  to  an  oblique  reference  structure 

V  =  AA 

and  it  transforms  the  initial  factors  into  the  new  oblique  reference 
vectors.  Hence  A  contains  in  its  columns  the  direction  cosigns 
of  the  new  reference  axes,  using  the  initial  set  as  an  orthogonal 
basis. 

The  program  then  computes  the  matrix  of  correlations  1(1  between 
reference  vectors 


ip  -  A'A  .  • 

The  transformation  A  from  the  orthogonal  factors  to  the  ref¬ 
erence  vectors  and  a  hypothetical  transformation  T  from  the  orig¬ 
inal  factors  to  the  new  set  of  oblique  factors  are  related  by 

D  =  T'A  (5) 

where  D  is  the  diagonal  matrix  of  the  scaler  products,  or  correla¬ 
tions,  between  vectors  T  and  A  (p  =  1,  ...,  n).  Because  A 
is  defined  as  the  vector  Eormal  toPthe  hyperplane  of  all  factorsp 
Tq>  q  i  p,  it  is  uncorrelated  with  every  factor  except  Tp,  and 
hence  D  is  diagonal. 

From  ( 5 ) 

T'  =  DA-1 


tells  us  that  T'  may  be  calculated  from  A-1  by  normalizing  its 
rows,  since  the  rows  of  T'  are  normalized  and  left  multiplication 
by  a  diagonal  matrix  is  equivalent  to  multiplying  each  jth  row  by 
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the  element  dj,  To  normalize  A-1  by  rows  we  may  multiply  each 
row  by  the  reciprical  of  the  square  root  of  the  diagonal  elements  of 

A~l( A-1 ) '  =  A_1{ A' )-1  =  (A1 A)-1  =  *-1  (6) 

So  OBLIMAX  simply  inverts  $  and  finds  the  elements  of  D  as  ex¬ 
plained  above.  It  then  finds  the  oblique  factor  pattern  P  and 
matrix  of  correlations  between  factors  0  by  the  formulas  derived 
here. 

0  =  T'T  -  DA“1(DA'1)'  =  DA“1(A'1)'D'  =  D(A'A)_1D 

0  =  D^_1D 

P  =  S0"1  =  (AT)0*1  =  AT(T'T)-1  =  ATT~1(T')~1  =  ACT')-1 

V  =  AA 
A  =  VA'1 

P  =  VA‘'1(T’  )“*  =  V(T’A)-1  =  VD_1 

P  =  VD"1 

Finally,  the  factor  structure  may  be  computed 
S  =  P0  . 

NOTE:  the  program  has  an  option  to  normalize  variables  in  common 
factor  spaces  during  rotation.  Then  the  OBLIMAX  function  is  maximized 
on 


instead  of 

v^j  -  (Z^|Aj)  ,  i  =  1,  ...»  n  • 

This  change  eliminates  the  effects  of  differing  variable  communali- 
ties,  making  angular  closeness  of  fit  the  determining  factor. 
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D ;  Factor  Scores  Estimation  Program 


CLASS :  Self-Contained  General  Purpose  Program 

LANGUAGE:  Fortran  II 

PURPOSE :  To  estimate  factor  scores  using  the 

equation 


f  =  <t>A'R-1Z 


where  A  is  the  n  x  m  matrix  of  common  factor 
coefficients,  R  is  the  n  x  n  matrix  of  correlations 
(unity  in  the  diagonal),  Z  is  the  n  x  N  matrix 
of  standardized  scores,  and  f  is  the  m  x  N  matrix 
of  estimated  factor  scores.  f  is  an  m  x  m  matrix 
pf  factor  correlations;  it  is  not  used  in  orthogonal 
solutions. 

RESTRICTIONS: 


No.  of  variables 
No.  of  subjects 


Maximum  90 
Maximum  90 


DESCRIPTION,  USE  6  COMMENTS: 


Tape  Assignment 


Logical  Tape: 
2 
3 
6 


System  BCD  input  tape. 

System  BCD  output  tape. 

Original  correlations  and 
factor  corfelations  in 
binary  mode. 

Factor  loadings  in  binary 
mode . 

Raw  data  in  binary  mode. 

Factor  scores  in  BCD  mode 
for  punching. 


Card  Deck  Preparation 


Each  run  requires  the  following  cards: 
A.  Title  Card 


Col  1-78  Any  BCD  information 

desired  as  page  headings. 
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Problem  Card  1 


Col  1-  5  Problem  number  used  to 

locate  original 
correlations  on  input 
tape  and  identify  BCD 
output . 

Col  6-10  Number  used  to  find  factor 

loadings . 

Col  11-15  If  an  oblique  solution, 

enter  the  number  which 
identifies  the  factor 
correlations  on  tape. 

If  orthogonal  leave 
b lank . 

Col  16- le  Number  of  factors  for 

which  factor  scores 
are  to  be  computed. 

Must  be  equal  to  or 
less  than  the  number 
on  tape. 

Col  19-  2  1  Number  of  tape  batteries 

of  raw  data  making  up 
variable  set. 

Col  22 

=  1  R  inverse  is  printed. 

=0  R  inverse  is  not  printed. 

Col  23 

=1  Test  coefficients  (standard 

scores)  are  printed. 

=0  Not  printed. 

Col  24 

=1  Test  coefficients  (raw 

scores)  are  printed. 

=0  Not  printed. 
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C.  Problem  Card  2 


Col  1-78  File  identification  in 

6  col  fields  as 
indicated  in  Col  19-21 
of  previous  card. 
Maximum  of  13. 

Repeat  above  cards  for  each  job. 

D.  Finish  Card 

Col  1-6  Punch  FINISH 


E.  Blank  Card 

F.  Blank  Card 


ROUTINES  USED  IN  SRL-FS: 

INVERT  Computes  R  inverse 

TION  11  Positions  tape  11  at 

correct  raw  data  file. 

LSHFT  Shifts  integer  numbers 

into  FORTRAN  II  format 
since  tape  11  is  written 
in  FORTRAN  IV. 
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Appendix  II 

TIME  FUNCTIONS  OF  COMPUTATION 

In  this  appendix  we  present  a  compilation  of  data  which  will  facilitate 
estimation  of  computation  times  on  various  computers.  The  factor  analysis 
techniques  may  be  described  in  terms  of  the  basic  matrix  operations,  sum, 
product,  inversion,  and  eigenvalue  and  eigenvector  computation.  The  following 
table  gives  the  computation  time  where  y ,  6,  and  a  are  the  multiplication, 
division,  and  addition  times,  respectively,  for  a  given  computer. 

1.  Computation  of  all  eigenvalues  and  eigenvectors  of  matrix 

by  the  Jacobi  method  (Beference  6h) : 

T  =  10  N3  y  +  20  N3  a 

2.  Inversion  of  a  symmetric  matrix  A^xjj  by  bordering: 

T  =  N2(N-l)u  +  i-N(N2  +  2)6  +  kN-l)(4N2  -  N  +  15)a 
0  o 

3.  Multiplication  of  A^^  •  BHxp: 

T  =  NPMy  +  NP(M-l)o 

4.  Addition  of  ANxM+  BNxM: 

T  =  NMa 


5.  Computation  of  all  eigenvalues  and  eigenvectors  of  matrix 
*NXN  ky  the  Householder-Ortega-Hilkinson  method: 

T  =  .00162N2, 

where  T  is  the  time  in  minutes  on  the  IBM  704  computer. 
Application  of  this  equation  to  another  computer  will  require 
multiplication  by  a  scale  factor  which  reflects  the  ratio  of 
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speed  of  the  other  computer  compared  to  the  IBM  704.  This  empirical 
equation  was  derived  by  least  squares  methods  from  data  given  m 
Reference  65  . 


Appendix  III 

DESIGN  OF  A  FACTOR  ANALYSIS 


1.  Data  Collection 

There  are  features  and  properties  of  factor  analysis  which  are 
learned  from  experience  by  users,  but  which  are  rarely  written  into 
textbooks  on  the  subject.  The  purpose  of  this  appendix  will  be  to 
touch  on  some  of  these  features . 

Factor  analysis  is  performed  on  data  which,  geometrically  speaking, 
consists  of  N  points  each  situated  in  n-dimensional  space.  The  purpose 
of  factor  analysis  is  to  describe  the  shape  of  the  set  of  N  point  as 
comprehensively  and  briefly  as  possible  through  mathematical  shorthand. 

In  this  framework,  some  of  the  shortcomings  of  factor  analysis 
can  be  described.  In  the  first  place,  only  the  correlation  between 
pairs  of  variables  is  used  to  describe  the  raw  data.  This  constitutes 
a  drastic  reduction  of  the  data  into  very  few  numbers.  If  N=200  in 
ordinary  three  dimensional  space,  then  forming  the  correlations  Involves 
reducing  600  numbers  into  only  3  numbers.  Factor  analysis  reverses 
this  process,  and  from  these  3  numbers  manufactures  3  characteristic 
roots  and  9  characteristic  vector  elements.  Evidently  the  entire 
process  depends  on  how  adequately  all  the  information  in  the  600  numbers 
can  be  condensed  and  contained  in  only  3  numbers . 

These  three  numbers  are  the  product -moment  correlations  between 
the  variables.  These  are,  to  begin  with,  pairwise  expressions.  They 
take  each  pair  of  variables ,  1  and  2 ,  2  and  3,1  and  3 ,  and  presume  to 
describe  in  one  numerical  quantity  what  the  relationship  is  between  each 
pair.  It  becomes  clear  that  much  of  the  important  information  about 
the  shape  of  the  set  of  N  points  may  be  lost.  It  will  depend,  of 
course,  on  the  shape  of  the  set.  The  implication  is  clear.  Look 
at  the  data  out  of  which  the  correlations  are  being  calculated.  It 
is  not  feasible  to  try  to  make  3  dimensional  sketches,  and  besides, 
there  will  usually  be  far  more  than  just  three  variables.  From  n 
variables  there  will  be  n(n-l)/2  different  pairs  of  variables,  and 
the  same  number  of  correlation  coefficients.  Even  plotting  out  all 
these  graphs  will  be  a  major  job,  and  for  practical  purposes  it  will 
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be  sufficient  to  plot  only  a  portion  of  all  the  N  points  defined  by  the 
available  data. 

What  should  the  researcher  be  looking  for?  There  are  three  basic 
danger  signs  to  look  for. 

a.  Outliers:  data  points  which  don't  belong  in  the  set,  either 
because  o'f  incorrect  collection  or  copying  of  data  or  irrelevant  data. 

b.  Multiple  populations:  data  points  will  be  found  to  form  two 
clusters  in  some  graphs,  in  which  case  a  different  factor  analysis  for  each 
cluster  will  be  necessary.  In  practical  problems  the  difference  will  be 
due  to  some  observable  fact  such  as  differences  of  sex,  production  lino, 
experimental  technique,  etc.,  which  was  initially  ignored  because  it  was 
considered  unimportant  for  purposes  of  this  analysis. 

A  more  difficult  danger  sign  in  this  connection  is  the  presence  of 
multiple  populations  not  separated  by  distance.  The  only  way  to  spot  this 
is  to  go  back  to  the  raw  data  whenever  a  graph  is  found  whose  points  follow 
an  X,  Y,  or  V  shaped  pattern.  The  purpose  will  be  to  see  whether  points  on 
the  one  leg  of  the  V  have  any  other  feature  in  common,  h'o  rigid  rules 
can  be  given  here.  The  picture  will  never  be  as  clear-cut  as  is  suggested 
here,  and  only  experience  can  guide  the  researcher  into  those  habits  and 
practices  of  data  examination  which  ferret  out  suspicious  weaknesses  in 
the  original  design  of  data  collection. 

c.  Curvilinearity  of  data:  the  product-moment  correlation  coefficient 
measures  the  strength  of  relationship  between  two  variables  only  if  that 
relationship  is  linear.  If  the  graph  of  the  data  plots  into  the  shape  of 

a  C  or  S  ,  then  the  whole  projected  factor  analysis  should  be  stalled 
at  least  temporarily  until  a  statistician  cun  be  shown  the  data.  The 
various  options  which  might  be  recommended  by  him  at  this  point  go  beyond 
the  scope  of  this  study.  , 

These  are  the  major  danger  signals.  There  are  others,  such  as 
heteroscedosticity  (data  points  pinched  together  at  some  places  cn  the 
graph  and  spread  out  at  others),  but  here  again  the  investigator  should 
be  guided  by  the  general  warning — if  anything  looks  suspicious,  ask 
about  it. 
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The  actual  calculation  of  the  product -moment  correlation 
coefficient  is  described  in  Section  2,  and  will  not  be  spelled 
out  here.  What  is  not  so  frequently  described,  and  often  badly 
needed,  is  advice  about  avoiding  biases  due  to  improper  data 
collection. 

The  framework  for  such  a  description  must  begin  with  the 
classical  distinction  between  population  and  sample.  Ideally, 
we  might  want  to  construct,  for  each  pair  of  variables,  the 
population  correlation  coefficient.  Tor  practical  purposes  this 
would  be  unwise  in  most  cases.  If  only  because  the  labor, 
editing,  and  error  control  would  be  so  demanding,  we  would  be 
led  to  sample. 

It  is  in  defining  this  sample  that  bias  is  apt  to  enter, 
particularly  since  any  investigator  is  initially  prone  to  the 
temptation  to  feel  that  a  big  correlation  is  a  good  correlation. 

It  is  only  with  experience  that  an  investigator  comes  to  accept 
the  statistical  standard  that  the  population  correlation,  or  an 
unbiased  approximation  to  it,  is  the  only  good  correlation.  To 
bias  a  correlation  coefficient,  it  is  necessary  only  to  remove 
a  few  observations  from  the  middle  of  the  set  of  observations, 
and  since  most  observations  will  be  in  the  middle  in  any  case, 
such  a  removal  will  not  seem  particularly  unprofessional. 

The  professional  standard  which  will  be  adhered  to  is  the 
criterion  of  random  sampling — each  data  set  should  have  the  same 
chance  of  having  its  data  incorporated  into  the  computations  as 
any  other  data  set.  Whether  this  is  accomplished  by  strict  random 
sampling,  systematic  sampling,  or  cluster  sampling  is  irrelevant 
here — it  is  the  criterion  which  must  be  strictly  adhered  to  if  the 
sample  correlation  coefficient  is  to  contain  all  the  information 
that  it  can  about  the  population  coefficient. 

Another  important  issue  in  connection  with  sampling  is  that 
of  sample  size.  How  large  a  sample  ought  one  to  take?  Here  again 
no  attempt  will  be  made  to  repeat  the  technical  approach  taken  by 
most  textbooks,  but  to  deal  in  terms  of  insights.  There  is  a 
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popular  feeling  that  something  called  a  "law  of  averages"  exists. 

Among  non-professional  people,  this  law  exists  as  a  feeling  that 
something  ought  to  happen,  and  few  people  would  dare  to  try  to 
formulate  the  "law  of  averages"  specifically,  in  the  sense  that 
they  might  formulate  the  law  of  gravity  or  Archimedes  principle 
explicitly.  Part  of  the  reason  is  that  certain  key  concepts  such 
as  variance  are  not  part  of  common  knowledge,  and  that  an  explicit 
formulation  of  the  law  of  averages  requires  this  concept. 

The  best  that  can  be  done  to  formulate  the  law  of  averages 
without  using  the  idea  of  variance  is  to  say  that  an  average 
(height,  weight,  etc.)  will  be  "improved"  if  it  is  based  on  more 
and  more  observations.  When  the  law  is  formulated  explicitly 
it  appears  that  this  "improvement"  is  subject  to  another  law, 
commonly  referred  to  as  the  "law  of  diminishing  returns".  More 
specifically,  it  says  that  bringing  in  more  observations  does 
improve  the  accuracy  of  an  average,  but  that  the  hundredth  sample 
does  not  contribute  as  much  as  the  tenth  observation,  and  the 
thousandth  observation  contributes  even  less. 

These  laws  also  apply  to  estimating  a  product -moment  correlation 
coefficient.  The  larger  the  sample,  the  better  will  be  the  coefficient 
probably.  However,  successive  samples  contribute  less  and  less  to 
the  goodness  of  the  estimate.  (These  are  crude  statements  only  of 
the  situation,  and  are  intended  to  be  only  a  first  approximation 
to  the  kind  of  formulation  which  would  satisfy  a  professional 
statistician. ) 

The  actual  rate  of  convergence  of  the  sample  correlation 
coefficient  to  its  true  population  value  cannot  be  simply  described, 
since  it  depends  on  what  the  true  value  is.  If  the  true  correlation 
is  high,  only  a  small  sample  is  needed,  whereas  if  it  is  near  zero 
a  large  sample  will  be  required.  Since  in  factor  analysis  the  one 
sample  we  draw  will  have  to  serve  for  estimating  many  correlations, 
it  seems  desirable  to  concentrate  only  on  those  correlations  where 
we  are  likely  to  be  in  trouble,  that  is,  cases  of  zero  correlation 
in  the  population. 

2.  Basic  Requirements  for  a  Factor  Analysis 

The  first  issue  facing  the  investigator  will  be  that  of  deciding 
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whether  factor  analysis  is  at  all  relevant  to  the  problem  facing  him. 
Consultation  with  a  professional  factor  analyst  of  course  is  the  best 
advice  that  can  be  given,  but  in  certain  situations  it  may  be  safe 
to  proceed  with  no  more  than  the  guidance  given  here . 

Factor  analysis  was  first  employed  in  personality  testing  and 
intelligence  testing,  and  the  conditions  required  for  using  it  can 
be  described  with  reference  to  an  analogous  situation  from  psychology. 
The  reader  can  then  decide  for  himself  whether  these  conditions  apply 
to  the  experimental  data  he  is  faced  with — whether  from  an  assembly 
line,  an  electrocardiograph  or  a  radar  or  radio  signal  full  of  un¬ 
wanted  noise. 

First,  all  the  variables  must  be  results  rather  than  causes. 

They  must  be  analogous  to  school  examination  results  from  different 
subjects — mathematics,  physics,  music.  If  any  of  the  variables 
are  causes — such  as  parents'  I.Q.  or  education,  pre-school  play 
habits,  etc. — and  the  purpose  of  the  study  is  to  find  the  relation 
between  causes  and  effects,  then  factor  analysis  is  not  the  proper 
technique . 

Secondly,  the  investigator  should  ask  himself  whether  the  kind 
of  answer  provided  by  factor  analysis  will  be  at  all  relevant  to 
the  question  he  is  posing  as  he  looks  at  the  data.  That  answer, 
in  the  school  analogy,  will  be  something  to  this  effect:  there  is 
one  factor  with  high  weighting  on  all  subjects,  a  second  with  high 
weighting  on  mathematics  and  physics  and  negative  weighting  on  music 
appreciation.  It  will  be  up  to  the  investigator  to  discover  or  decide 
that  the  first  factor  is  general  intelligence  and  the  second  is 
scientific  aptitude.  But  it  must  be  kept  in  mind  that  this  kind 
of  answer  may  not  be  what  is  really  wanted.  If  the  investigator 
is  really  interested  in  deciding  who  should  be  admitted  to  college, 
or  whether  boys  differ  from  girls  in  scientific  ability,  then  he 
should  look  for  new  or  different  analytic  techniques.  Factor  analy¬ 
sis  should  never  be  undertaken  solely  because  the  data  are  in  the 
proper  form  for  factor  analyzing.  Any  data  processing  technique  such 
as  factor  analysis  should  be  treated  as  relevant  or  irrelevant 
depending  on  what  problem  is  being  posed,  what  hypothesis  is  being 
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tested,  why  the  data  are  being  collected  in  the  first  place. 

Assuming  that  the  two  foregoing  conditions,  absence  of  causal 
data  and  relevance  of  factor  analysis,  have  been  met,  we  may  now 
turn  to  issues  of  proper  data  collection.  Analysis  is  bound  to 
be  better  if  good  data  are  collected,  and  irrelevant  data  rejected. 

Good  data  will  have  the  following  characteristics: 

a.  Completeness:  Each  data  set  will  contain  one  observation 
on  each  of  the  variables  incorporated.  This  condition  is  not 
absolutely  essential,  but  it  eases  the  computation  burden  consider¬ 
ably,  whether  calculations  are  performed  on  desk  calculators  or 
electronic  computers. 

A  trivial  and  an  unrealistic  example  will  show  how  one  must 
proceed.  Suppose  the  letter  x  represents  a  missing  observation, 
and  the  data  consists  of  six  data  sets  each  of  five  variates,  namely 
(2,  1,  x,  3,  x),  (x,  3,  2,  x,  7),  (3,  2,  5,  x,  1),  (  2 ,  x ,  3,  1,  x), 

(3,  4,  x,  1,  x)  and  (4,  x,  x,  3,  5).  To  form  the  correlation 
between  the  first  two  variates,  we  can  use  only  the  first,  third 
and  fifth  data  set,  since  only  these  contain  data  on  both  of  these 
variates.  However,  note  that  we  will  encounter  difficulties  in 
calculating  the  correlation  between  the  third  and  fourth  variate, 
since  only  the  fourth  data  set  contains  observations  on  both  variables, 
and  a  correlation  cannot  be  computed  from  one  such  pair.  The  investi¬ 
gator  must  watch  for  this  kind  of  situation.  One  other  condition  must 
be  met  before  we  can  proceed  to  accept  in  this  way  numerical  material 
containing  missing  data.  That  is,  there  must  be  no  relationship  between 
the  magnitude  of  the  missing  numbers  and  the  fact  of  their  being  missing. 
If  the  missing  numbers  are  all  unusually  large,  or  unusually  small,  then 
nothing- at  all  can  be  done  with  the  data. 

b.  Relevance:  Factor  analysis  will  be  much  improved  if  the  investi¬ 
gator  has  some  intelligent  suspicions  as  to  what  factors  might  emerge. 

In  such  a  situation,  the  most  desirable  thing  is  to  choose  variables 
which  will  yield  the  factor  if  it  exists.  Thus  if  a  range  of  scientific 
ability  is  expected  as  a  factor,  then  we  should  incorporate  variables 
on  physics,  chemistry,  art  and  music,  with  the  hope  that  one  factor 
will  have  positive  weight  on  the  first  two  and  negative  weights  on  the 
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last  two,  either  before  or  after  rotation.  Of  course,  factor  weightings 
are  non-directional  and  the  signs  of  the  weights  may  be  reversed, 
yielding  in  effect  an  anti-scientific  factor.  This  will  be  due  to 
the  arbitrariness  of  the  calculations,  and  the  investigator  can  change 
all  the  signs  before  publishing  the  results,  in  order  to  be  able  to 
provide  psychologically  meaningful  names  for  the  factors .  Even  the  * 
major  factor,  the  general  intelligence  one  in  any  examination  test 
data,  may  have  negative  weightings  on  all  the  items  and  thus  measure 
general  stupidity  instead  of  general  intelligence.  Each  factor  is  a 
dimension,  such  as  stupidity-intelligence,  and  we  may  refer  to  the 
factor  by  either  pole  of  the  dimension,  or  by  both  if  the  opposite 
polarity  is  not  clear  from  the  context.  Guilford  has  suggested 
collecting  three  variables  for  each  factor  suspected  to  exist,  and 
this  number  three  should  be  regarded  as  a  minimum. 

c.  Factorial  simplicity:  Ideally,  each  variable  should  con¬ 
tribute  to  a  very  significant  degree  to  only  one  underlying  factor,  ■ 
otherwise  the  factorial  structure  of  the  data  is  rendered  very  complex, 
and  even  rotation  will  fail  to  clarify  the  factors  into  meaningful 
psychological  entities.  The  foregoing  is  formulated  in  terms  of  the 
school  grade  analogy,  but  the  situation  is  the  same  in  any  field  of 
investigation. 

d.  Unbiasedness:  The  data  must,  insofar  as  possible,  consti¬ 
tute  a  random  sample  of  the  population  whose  factor  structure  we 
are  trying  to  describe.  That  is,  each  element  in  the  population 
should  have  the  same  opportunity  as  any  other  element  to  be  incor¬ 
porated  into  the  sample . 

e.  Linearity:  Raw  data  are  not  used  directly  in  a  factor 
analysis.  Rather,  the  relationship  between  all  possible  pairs  of 
variates,  as  measured  by  the  product-moment  correlation  coefficient 
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is  employed.  Other  measures  of  correlation  should  not  be  used. 

The  important  thing  to  note  here  is  that  the  product -moment  correla¬ 
tion  coefficient  measures  the  strength  of  the  linear  relationship 
between  two  variates.  If  the  relationship  is  not  linear,  but  is, 
say,  curvilinear,  the  coefficient  can  be  calculated  but  there  will 
be  distortion  and  bias  in  any  factors  which  are  calculated  from 
such  deceptive  coefficients .  Note  a  very  important  distinction 
here:  the  relationships  between  the  variates  must  be  linear,  but 
there  must  not  be  a  linear  dependency  between  variates:  one  variate 
cannot  be  the  sum  or  the  weighted  sum  of  two  or  more  other  variates, 
in  effect. 

f.  Editing:  Often  one  will  be  tempted  to  throw  away  data 
which  do  not  fall  in  line  with  the  rest  of  the  material.  The 
guiding  principle  here  is  that  one  can  reject  it  only  if  one  can 
be  sure  that  he  will  not  be  tempted  in  the  future  to  apply  the 
results  to  other  data  which  is  similarly  but  of  line. 
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Appendix  IV 

THE  REFERENCE  GUIDE  TO  FACTOR  ANALYSIS 

INTRODUCTION  TO  THE  REFERENCE  GUIDE 

A  factor  analysis  provides  a  description  of  n  variables  by  a 
linear  combination  of  m  hypothetical  factors.  The  reference  guide  is 
designed  to  help  a  scientist  to  obtain  such  a  representation.  Each  step 
presents  a  decision  to  be  made  by  the  user  and  refers  to  subsections  and 
appendices  of  the  report  which  will  help  him  make  these  decisions. 


A.  DESIGN  OF  EXPERIMENT 

1.  Choose  linearly  related  variables  ....  App.  3.2e 

2.  Randomly  sample  the  observations 

on  the  variables . . .  App.  3. 2d' 

3.  Choose  numbers  of  variables  and 
observations  within  computational 

bounds .  App.  1 


4.  Choose  only  normally  distributed 

,  variables  if  any  statistical 

factor  analytic  techniques  will 

be  used . .  2.5;  4.5 

5.  Choose  an  appropriate  number  of 
variables  for  a  hypothesized  number 

of  factors  ................  App.  3.2b ;  App.  3.2c 

6.  Select  an  appropriate  number  of 
observations  for  a  given  set  of 

variables .  4.6;  $.3;  App.  3.1 

B.  PROCESSING  RAW  DATA 

1.  Decide  on  the  correlation  coef¬ 


ficient  to  be  used . . .  2.3 

a.  For  quantitative  data .  2.3A;  2.3C 

b.  For  ranked  data .  2.3A 

c.  For  dichotomized  data  . .  2.3B;  2.3C 

2.  Treat  missing  data  by  three 

available  methods  .  .  2.6 
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3.  Decide  whether  to  scale  the 

correlation  coefficients  .  3.4 

4.  Compute  the  correlation  matrix 

in  proper  format  . .  App.  l.A 


THE  FACTOR  ANALYSIS 

1.  Choose  the  factor  analysis  technique 


to  be  used  . . . .  .  4.3 

a.  Principal-factor  technique  .  App.  l.A 

b.  Centroid  technique . Refs.  10,  66 

2.  Decide  upon  the  communality  values .  4.4 

a.  For  H  >  40,  choose  unities  .  .  .  App.  l.A 

b.  For  N  <  40  and  for  interpretive 
purposes ,  choose  squared  multiple 

correlations . . . 2.7;  App.  l.A 


c.  For  N  <  40  and  data  reduction 

purposes  (preservation  of  gramian 
properties  3.3),  choose  the  method 

of . .  .  4.4 

ROTATION 

1.  Decide  whether  to  rotate . .  5.2 

a.  If  purpose  of  factor  analysis  is 
data  reduction:  no  rotation 

b.  If  purpose  of  factor  analysis  is 
interpretation:  rotation 


2.  Choose  the  number  of  factors  to  rotate  .....  4.5 

3.  Select  the  kind  of  rotation  technique . .  5.3 

a.  Orthogonal  rotation  (Varimax),  if 

uncorrelated,  that  is  independent 


factors  are  hypothesized  . .  App.  l.B 

b.  Oblique  rotation  (Oblimax),  if 
correlated,  that  is  dependent 

factors  are  hypothesized . .  App.  l.C 
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E 


5.3;  7.2 
5.4 


.  INTERPRETATION 

1.  Orthogonal  case 

2.  Oblique  case  . 

F.  USING  THE  FACTORS . .  ...  4.7;  7.3;  App.  1.0 
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Appendix  V 
GLOSSARY 


Bi-factor  solution:  a  solution,  where  the  variables  are  described  by  a 

general  factor,  uncorrelated  group' factors ,  and  a  unique  factor  each. 

Biserial  correlation  coefficient:  a  bivariate  correlation  coefficient , 
where  one  variable  is  dichotomized  and  one  variable  has  quantitative 
scores. 

Centroid  solution:  a  close  approximation  to  the  principal-factor 

solution  with  considerable  saving  in  labor,  where  the  n  variables 
are  described  as  well  by  m  common  and  n  unique  factors. 

Common  factor:  a  factor  present  in  more  than  one  variable  of  a  set  of 
variables. 

Common-factor  space:  the  space  of  m  common  factors. 

Communality  of  a  variable:  the  sum  of  the  squared  common  factor  loadings 
of  the  variable;  or,  the  contribution  of  the  common  factors  to  the 
total  unit  variance  of  the  variable;  or,  common-factor  variance. 

Complete  correlation  matrix:  a  correlation  matrix  with  ones  in  the  main 
diagonal. 

Complete  factor  pattern:  a  factor  pattern  which  represents  the  total 
unit  variance  of  each  variable. 

Completeness  of  factorization:  the  problem  of  when  to  stop  factoring, 
that  is  when  to  stop  extracting  factors. 

Completeness  test:  a  test  to  check  for  completeness  of  factorization. 
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Complexity  of  a  variable;:  the  number  of  common  factors  involved  in  the 
description  of  a  variable. 

Contingency  coefficient:  a  bivariate  correlation  coefficient,  where  both 
variables  are  classified  into  two  or  more  categories. 

Correlation  coefficient:  the  coefficient  describing  the  linear  inter¬ 
relationship  of  two  variables. 

Correlation  matrix:  a  real,  symmetric  square  matrix  R,  whose  elements 
r^  are  the  correlation  c  officients  between  standardized  variables 
and  Zy 

Covarimin :  an  oblique  rotation  method. 

Dichotomized  variable:  a  variable  which  is  given  by  its  frequencies  in 
two  classes. 

Error  factor:  see  specific  factor. 

Factor:  factors  are  defined  as  the  hypothetical  constructs  or  hypothetical 
variables  in  terms  of  which  a  variable  is  linearly  represented. 

Factor  analysis:  the  analysis  of  a  set  of  variables  into  a  set  of  common 
and  unique  factors  by  factoring  the  correlation  matrix  of  those 
variables . 

Factoring  problem:  the  problem  of  factoring  a  given  correlation  matrix 
into  a  factor  matrix  with  an  arbitrary  reference  frame. 

Factor  loading:  same  as  loading  of  a  factor. 

r actor  matrix:  the  matrix  of  factor  loadings. 
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Factor  method:  a  method  to  factor  a  corrrlation  matrix  in  order  to 

obtain  a  representation  of  a  set  of  variables  in  terms  of  factors. 

Factor  model:  the  factor  model  is  given  by  a  set  of  n  equations 
describing  h  variables  in  terms  of  m  common  and  n  unique 
factors  under  the  assumption  that  the  variables  are  linearly  composed 
of  the  factors. 

Factor  pattern:  the  set  of  equations  describing  a  set  of  n  variables 
in  terms  of  m  common  and  n  unique  factors ;  sometimes  only  the 
table  of  factor  loadings  with  the  factor  designations  at  the  head 
of  the  columns  are  referred  to  as  a  pattern. 

Factor  score:  the  elements  of  a  factor  vector. 

Factor  solution:  a  solution  to  a  given  factoring  problem;  often  the 
factor  methods  are  called  factor  solutions. 

Factor  structure:  a  factor  structure  is  a  tabie  of  correlations  between 
the  variables  and  the  factors. 

Four-point  coefficient:  same  as  ^-coefficient . 

General  factor:  a  factor  present  in  all  variables  of  a  set  of  variables. 

Gramian  matrix:  a  symmetric,  positive  semidef inite  matrix,  where  a 

•  .  T 

symmetric  matrix  R  is  a  matrix  for  which  R  =  R  holds. 

T 

R  represents  thereby  the  matrix  with  rows  and  columns  of  R 
interchanged,  called  the  transpose  of  R.  Positive  semidefiniteness 
of  a  matrix  is  defined  as  the  property  of  a  matrix  to  have  only 
positive  or  zero  principal  minors. 

Group  factor:  a  factor  present  in  more  than  one  but  not  in  ill  variables 
of  a  set  of  variables. 
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Indeterminacy  in  factor  analysis:  referring  to  the  infinitude  of  factor 
solutions  accounting  for  the  factorization  of  an  observed  correlation 
matrix. 

Kaiser-Dickman  Method:  an  oblique  rotation  method. 

Kendall's  T-correlation  coefficient:  a  bivariate  correlation  coefficient 
for  ranked  data. 

Loadings  of  a  factor:  the  coefficients  of  the  factors  in  the  representation 
of  variables  by  the  factors. 

Multiple -factor  solution:  this  solution  is  obtained  by  transformation 
(rotation)  of  a  principal- factor  or  centroid  solution  according  to 
the  principles  of  simple  structure. 

Multiple -group  solution:  a  factor  solution  in  which  several  common 

factors  are  extracted  in  one  operation,  where  these  factors  can  be 
oblique. 

Oblimax:  an  oblique  rotation  method. 

Oblimin:  an  oblique  rotation  method. 

Oblique  rotation  method:  the  reference  frame  after  rotation  is  an 
oblique  one. 

Observation  =  measurement  =  subject  =  object  =  individual. 

Observed  correlation  coefficient:  a  correlation  coefficient  computed 
from  observed  data. 

Orthogonal  rotation  method:  the  reference  frame  after  rotation  is  an 
orthogonal  one. 
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Pattern:  same  as  factor  pattern. 


Pearson’s  product-moment  correlation  coefficient:  a  bivariate  correlation 
coefficient  for  quantitative  measurements. 

^-coefficient :  a  bivariate  correlation  coefficient  for  truely  dichotomized 
variables. 

Positive  semidefiniteness:  see  Gramian. 

Preferred  position  of  a  reference  frame:  a  reference  frame  for  which  the 
factor  pattern  has  a  certain  prescribed  format,  where  this  format  can 
be  given  in  different  ways ,  for  example  by  the  simple  structure 
criteria. 

Principal  component  solution:  a  principal-factor  solution  of  a  complete 
correlation  matrix;  there  are  no  unique  factors. 

Principal-factor  solution:  an  orthogonal  solution,  where  the  variables 
are  described  by  m  common  and  n  unique  factors ;  the  reduced 
correlation  matrix  is  factored. 

Product-moment  correlation  coefficient:  same  as  Pearson's  product-moment 
correlation  coefficient. 

Quart imax:  an  orthogonal  rotation  method. 

Quart imin :  an  oblique  rotation  method. 

Rank :  if  N  objects  are  arranged  in  an  order  according  to  some  property, 

which  they  all  possess  in  a  varying  degree,  the  objects  are  said  to 
be  ranked;  each  object  has  a  rank  expressed  as  a  natural  number 
between  1  and  N. 
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Hank  of  a  matrix:  the  rank  of  a  matrix  is  the  number  of  rows  (or  columns) 
of  the  largest  submotrix  whoso  determinant  is  not  zero. 

Reduced  correlation  matrix:  a  correlation  matrix  with  communalities  in 
the  main  diagonal. 

Reduced  factor  pattern:  a  factor  pattern  which  represents  the  common 
factor  variance  of  each  variable . 

Reference  axes :  geometrical  interpretation  of  the  factors  for  rotation; 
the  configuration  of  the  reference  axes  can  be  oblique  or  orthogonal. 

Reference  frame:  the  frame  of  reference  axes. 

Reproduced  correlation  coefficient:  a  correlation  coefficient  reproduced 
from  the  pattern  of  factor  loadings. 

Residual  correlation  coefficient:  a  correlation  coefficient  computed  as 
the  difference  between  an  observed  and  a  corresponding  reproduced 
correlation  coefficient. 

Residual  matrix:  a  matrix  whose  entries  are  the  residual  correlation 
coefficients. 


Rotation :  procedure  to  re-orient  the  arbitrary  reference  axes,  determined 
by  the  method  of  factoring  the  correlation  matrix,  to  some  position 
useful  for  the  interpretation  of  factors. 

Rotational  problem:  the  problem  of  rotating  the  arbitrary  reference  frame, 
obtained  as  the  result  of  factoring  the  correlation  matrix,  into  a 
preferred  position. 

Rotation  method:  same  as  rotation  technique. 

Rotation  technique:  a  technique  to  solve  the  rotational  problem;  there 
are  orthogonal  and  oblique  rotation  techniques. 
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Simple  structure:  a  format  of  the  factor  pattern,  established  by  Thurstone, 
as  the  goal  of  rotation,  observing  several  criteria. 

Spearman 1 s  rank  corre lat ion :  a  bivariate  correlation  coefficient,  where 
both  variables  are  ranked. 

Spearman's  rank  difference  method:  same  as  Spearman's  rank  correlation. 

Spearman's  p-correlation  coefficient:  same  as  Spearman's  rank  correlation. 

Specific  factor:  results  from  decomposing  the  uniqueness  of  a  variable 
into  two  portions  of  variance — that  due  to  the  particular  variable 
set  and  that  due  to  error  in  measurement.  Correspondingly  two 
factors  are  defined:  the  specific  factor  and  the  error  factor. 

Standardized  variable:  a  variable  whose  mean  is  zero  and  whose  standard 
deviation  is  one. 

Structure :  same  as  factor  structure. 

Symmetric  matrix:  see  Gramian. 

Tetrachoric  correlation  coefficient:  a  bivariate  correlation  coefficient , 
where  both  variables  are  dichotomized. 

Thorndike's  median  ratio  coefficient  of  correlation:  a  bivariate  correlation 
coefficient  for  quantitative  data. 

Total  contribution  of  a  factor  to  the  variances  of  all  variables :  the  sum 
of  squared  loadings  of  all  variables  on  that  factor. 

Total-factor  space:  the  space  of  m  common  and  n  unique  factors • 

Trace  of  a  matrix:  the  sum  of  diagonal  values  of  a  matrix. 
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Two-factor  solution:  a  solution,  whore  all  variables  are  described  by 
one  general  factor  and  one  unique  factor  each. 

Uni-factor  solution:  an  orthogonal  factor  solution,  where  groups  of 
variables  are  each  described  by  only  one  factor. 

Unique  factor:  a  factor  present  in  a  single  variable  of  a  set  of 
variables . 

Uniqueness :  the  contribution  of  the  unique  factor  of  a  variable  to  the 
unit  variance  of  that  variable. 

Uniqueness  of  a  solution:  the  problem  referring  to  discrepancies  of  two 
factor  solutions  due  to  sampling  effects. 

Variable:  a  vector  of  N  observed  values  where  N  is  the  number  of 
observations . 

Varimax:  an  orthogonal  rotation  method. 

Yule's  coefficient  of  association:  a  bivariate  correlation  coefficient 
for  dichotomized  data. 

Yule's  coefficient  of  colligation:  a  bivariate  correlation  coefficient 
for  dichotomized  data. 
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